- Mar 01, 2021
-
-
lalice123 authored
-
lalice123 authored
-
lalice123 authored
* removed unused/least used frequencies not only they cause latency spikes but they can increase jitters unneccesarily. Signed-off by: Alex Finhart <alexfinhart@gmail.com>
-
eight authored
* we tried 30/3 configuration, yet it doesn't satisfy all around performance, considering our ram size(6gb), we can use this configuration without issues.
-
eight authored
-
eight authored
-
eight authored
-
eight authored
-
eight authored
with this commit we aim to reduce jitters by modifying frequencies steps and thermal zones. Signed-off by: Alex Finhart <alexfinhart@gmail.com>
-
lalice123 authored
-
lalice123 authored
-
lalice123 authored
-
lalice123 authored
* we don't really need read_ahead and nr_requests. 1. we run on f2fs which is alread fast enough. 2. we don't want async request running on background, we are not running a server anyway. Signed-off by: Alex Finhart <alexfinhart@gmail.com>
-
lalice123 authored
* im to lazy to explain this, maybe ill revert this later* Signed-off by: Alex FInhart <alexfinhart@gmail.com>
-
lalice123 authored
*based from ktweak by tytydraco Signed-off by: Alex Finhart <alexfinhart@gmail.com>
-
It's possible for a user fault to be triggered during task exit that results in swap readahead, which is not useful. Skip swap readahead if the current process is exiting. Change-Id: I5fad20ebdcc616af732254705726d395eb118cbe Signed-off-by: Tim Murray <timmurray@google.com>
-
SD_BALANCE_{FORK,EXEC} and SD_WAKE_AFFINE are stripped in sd_init() for any sched domains with a NUMA distance greater than 2 hops (RECLAIM_DISTANCE). The idea being that it's expensive to balance across domains that far apart. However, as is rather unfortunately explained in: commit 32e45ff ("mm: increase RECLAIM_DISTANCE to 30") the value for RECLAIM_DISTANCE is based on node distance tables from 2011-era hardware. Current AMD EPYC machines have the following NUMA node distances: node distances: node 0 1 2 3 4 5 6 7 0: 10 16 16 16 32 32 32 32 1: 16 10 16 16 32 32 32 32 2: 16 16 10 16 32 32 32 32 3: 16 16 16 10 32 32 32 32 4: 32 32 32 32 10 16 16 16 5: 32 32 32 32 16 10 16 16 6: 32 32 32 32 16 16 10 16 7: 32 32 32 32 16 16 16 10 where 2 hops is 32. The result is that the scheduler fails to load balance properly across NUMA nodes on different sockets -- 2 hops apart. For example, pinning 16 busy threads to NUMA nodes 0 (CPUs 0-7) and 4 (CPUs 32-39) like so, $ numactl -C 0-7,32-39 ./spinner 16 causes all threads to fork and remain on node 0 until the active balancer kicks in after a few seconds and forcibly moves some threads to node 4. Override node_reclaim_distance for AMD Zen. Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Suravee.Suthikulpanit@amd.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas.Lendacky@amd.com Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190808195301.13222-3-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Compaction of running memory is a good thing. It's also required to properly allocate memory for large applications, like new processes in Chrome or make room for a large virtual machine. Unfortunately, the default configuration of Linux allows all memory to be compacted. This is a good thing for servers. An application running server side can tolerate micro stalls since the latency impact is almost not measurable (depending on the application, of course). But on a desktop configuration with X, Wayland, Gnome, KDE, etc, the dropped frames and lost input are very obvious. Lets prevent these applications from having their memory moved during compaction. Although compaction will take longer and new processes will take longer to spawn under high memory pressure / external memory fragmentation, the actual experience of the system will feel more responsive and consistent under these adverse conditions. This commit adapted from zen-kernel/zen-kernel@394ae0c Signed-off-by: Rapherion Rollerscaperers <rapherion@raphielgang.org>
-
- Feb 28, 2021
-
-
The most frequent user of fenced GMU writes, adreno_ringbuffer_submit(), performs a fenced GMU write under a spin lock, and since fenced GMU writes use udelay(), a lot of CPU cycles are burned here. Not only is the spin lock held for longer than necessary (because the write doesn't need to be inside the spin lock), but also a lot of CPU time is wasted in udelay() for tens of microseconds when usleep_range() can be used instead. Move the locked fenced GMU writes to outside their spin locks and make adreno_gmu_fenced_write() use usleep_range() when not in atomic/IRQ context, to save power and improve performance. Fenced GMU writes are found to take an average of 28 microseconds on the Snapdragon 855, so a usleep range of 10 to 30 microseconds is optimal. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
-
lalice123 authored
-
lalice123 authored
let users or some post init stuff enable it Signed-off by: lalice <alexfinhart@gmail.com>
-
lalice123 authored
-
lalice123 authored
-
lalice123 authored
-
lalice123 authored
-
lalice123 authored
-
lalice123 authored
-
It isn't guaranteed a CPU will idle upon calling lpm_cpuidle_enter(), since it could abort early at the need_resched() check. In this case, it's possible for an IPI to be sent to this "idle" CPU needlessly, thus wasting power. For the same reason, it's also wasteful to keep a CPU marked idle even after it's woken up. Reduce the window that CPUs are marked idle to as small as it can be in order to improve power consumption. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> (cherry picked from commit ed535b9624177b76f1eea6f0671386c08711c6a5) Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live>
-
Signed-off-by: celtare21 <celtare21@gmail.com> Signed-off-by: Yasir-Siddiqui <www.mohammad.yasir.s@gmail.com> Signed-off-by: Pulkit077 <pulkitagarwal2k1@gmail.com> Signed-off-by: ashwatthama <sai404142@gmail.com>
-
lalice123 authored
-
We don't need accurate-math on GPU driver Signed-off-by: Raphiel Rollerscaperers <raphielscape@outlook.com> Signed-off-by: Pulkit077 <pulkitagarwal2k1@gmail.com> Signed-off-by: ashwatthama <sai404142@gmail.com> Signed-off-by: karthik558 <karthik.lal558@gmail.com>
-
HAVE_MOVE_PMD enables remapping pages at the PMD level if both the source and destination addresses are PMD-aligned. HAVE_MOVE_PMD is already enabled on x86. The original patch [1] that introduced this config did not enable it on arm64 at the time because of performance issues with flushing the TLB on every PMD move. These issues have since been addressed in more recent releases with improvements to the arm64 TLB invalidation and core mmu_gather code as Will Deacon mentioned in [2]. >From the data below, it can be inferred that there is approximately 8x improvement in performance when HAVE_MOVE_PMD is enabled on arm64. --------- Test Results ---------- The following results were obtained on an arm64 device running a 5.4 kernel, by remapping a PMD-aligned, 1GB sized region to a PMD-aligned destination. The results from 10 iterations of the test are given below. All times are in nanoseconds. Control HAVE_MOVE_PMD 9220833 1247761 9002552 1219896 9254115 1094792 8725885 1227760 9308646 1043698 9001667 1101771 8793385 1159896 8774636 1143594 9553125 1025833 9374010 1078125 9100885.4 1134312.6 <-- Mean Time in nanoseconds Total mremap time for a 1GB sized PMD-aligned region drops from ~9.1 milliseconds to ~1.1 milliseconds. (~8x speedup). [1] https://lore.kernel.org/r/20181108181201.88826-3-joelaf@google.com [2] https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg140837.html Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20201014005320.2233162-3-kaleshsingh@google.com Link: https://lore.kernel.org/kvmarm/20181029102840.GC13965@arm.com/ Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
-
The schedutil driver sets sg_policy->next_freq to UINT_MAX on certain occasions to discard the cached value of next freq: - In sugov_start(), when the schedutil governor is started for a group of CPUs. - And whenever we need to force a freq update before rate-limit duration, which happens when: - there is an update in cpufreq policy limits. - Or when the utilization of DL scheduling class increases. In return, get_next_freq() doesn't return a cached next_freq value but recalculates the next frequency instead. But having special meaning for a particular value of frequency makes the code less readable and error prone. We recently fixed a bug where the UINT_MAX value was considered as valid frequency in sugov_update_single(). All we need is a flag which can be used to discard the value of sg_policy->next_freq and we already have need_freq_update for that. Lets reuse it instead of setting next_freq to UINT_MAX. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Yaroslav Furman <yaro330@gmail.com> - backported to 4.4 Signed-off-by: Udit Karode <udit.karode@gmail.com>
-
This reduces the size of final kernel image by 350kb. Signed-off-by: Yaroslav Furman <yaro330@gmail.com> Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com> Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live> Signed-off-by: CryllicBuster273 <cryllicbuster273@pixelexperience.org> Signed-off-by: Nishant Singh <nishant.maintainers@pixysos.com> Signed-off-by: depth118 <safanmuhammed1@gmail.com> Signed-off-by: Depth118 <safanmuhammed1@gmail.com> Signed-off-by: karthik558 <karthik.lal558@gmail.com>
-
lalice123 authored
-