Comments (44)
Hi @hamadmarri @Salekin-1169 @owl4ce
I just checked the latest xanmod-cacule setting. It doesn't include the starvation fix, and SCHED_AUTOGROUP
(and therefore FAIR_GROUP_SCHED
) are enabled. Also it includes the following scheduler tweaks and uses the non-standard 500Hz timer (no such an option in mainline):
sysctl_sched_nr_migrate = 256
sysctl_sched_rt_runtime = 980000
I just opened an issue at xanmod/linux#112 and mentioned the starvation fix and disabling FAIR_GROUP_SCHED
.
from cacule-cpu-scheduler.
I'm not sure which Linux distro you are using. Generally speaking, you may try:
- Download the latest kernel source from https://www.kernel.org/
- Copy the patch file to the kernel source folder, and run
patch -p1 -i interactivity_score_fix.patch
- Get your current kernel config file. Usually it should be located at
/proc/config.gz
. If not, try to runmodprobe configs
first. - In the kernel source folder, run
zcat /proc/config.gz > .config
- Run
make menuconfig
and make sure you disable the following:
General Setup-->Automatic process group scheduling
General Setup-->Control Group support-->CPU controller-->Group scheduling for SCHED_OTHER
You may also want to distinguish with the existing kernel by appending some suffix at General Setup-->Local version - append to kernel release
.
6. Run make -jx
, where x is the number of CPU cores. For example, if you have a 4-core CPU, you may run make -j4
.
7. Run make install
and make modules_install
.
8. Create initramfs and update your bootloader configuration. Based on your distro, the instructions will be different.
9. If you are using any out-of-tree modules e.g. nvidia proprietary drivers, you also need to recompile or reinstall them. Depend on your distro, the instructions will be different.
For the initramfs, bootloader, and out-of-tree modules, you should lookup the instructions of your specific Linux distro. Some distros may also have a guide on how to build custom kernels.
from cacule-cpu-scheduler.
@hamadmarri @raykzhao sorry about the delayed reply. I tested the latest patch, and it fixed my issue completely. Also, I tested on a freshly reinstalled system, so all the values are set to default.
I only faced some minor audio lags (very minimal) during recording, but other than that, everything was butter smooth.
Thank you so much for your support, the issue is completely fixed for me
Wish you all a very Happy New Year
from cacule-cpu-scheduler.
This solution is suggested by Alexandre Frade
Thanks to him
Hamad When executing the nvidia-dkms and mkinitramfs triggers, started the freezes reported by users, even with the fix "remove start_exec = 0", the system normalized without the autogroup:
echo 0 |sudo tee /proc/sys/kernel/sched_autogroup_enabled
all users with this problem, try this solution
from cacule-cpu-scheduler.
@hamadmarri I have a comparatively weak cpu (i3 2 physical + 2 logical cores), so I think the default
32768
isn't suitable for my pc, while other people didn't face issue with this.
I believe the problem is in this line. It seems that a task that has high run time will get lower value! I am not sure if original ULE used it this way. I will try one more fix. Also I will add reset_life_time as same as cachy has.
When a task flips to be non interactive where it should have a score somehow closer to it's previous score, but from the math I can see that the score jumps to the lowest and gradually starts to gain some score.
from cacule-cpu-scheduler.
Hi @hamadmarri
Although the new fix seems better than the vanilla CacULE on my machine, unfortunately it doesn't feel better than the original Cachy or the CacULE with either the smoother or the starvation fix. Somehow it's slightly more glitchy on my machine under heavy load.
I still think both Cachy and CacULE should be kept in a single patch (#19), since it's difficult to find a scheduling policy that is suitable for everyone. On my machine the original Cachy is definitely the winner, but it may not be the case for others.
from cacule-cpu-scheduler.
Same, but on my machine Cachy is more responsive but not under heavy load, CacULE is much more responsive if my machine is running heavy duty under 100% pressure of all cores. In conclusion, on my machine CacULE can lighten the load when all cores are 100%, I mean that it can still move freely.
from cacule-cpu-scheduler.
@hamadmarri thank you so much for your explanation. I'm still learning about it. CFS has about a decade of optimizations behind it, so for that it's throughput is higher I guess ?
Just curious, how different is Cacule from the FreeBSD implementation of ULE? Can I read up about it somewhere?
CFS is a result of previous crazy approaches. One of them was the scheduler made by the genius guy called Con Kolivas (the author of bfs and muqss) - his scheduler then was Staircase scheduler. Ingo Molnรกr made CFS which is inspired by Kolivas's work. FreeBSD didn't have a crazy guy like Kolivas.
Just curious, how different is Cacule from the FreeBSD implementation of ULE? Can I read up about it somewhere?
https://github.com/hamadmarri/cacule-cpu-scheduler#the-cacule-interactivity-score
I have only implemented the interactivity (IS) score (see Figure 1 https://github.com/hamadmarri/cacule-cpu-scheduler/blob/master/helper%20docs%20for%20kernel%20dev/FreeBSD/ULE.pdf)
My implementation of IS is not similar to ULE, it is more like the Cachy/HRRN way.
ULE uses 2 runqueues, and based on the IS math, the task is placed to some runqueue. They also use multiple levels queues for priority. Where I just use the CFS's vruntime calculations which affect the task priority on vruntime value. Which therefore affects the total run time of the process when calculating HRRN or IS
I used a shortcut approach to adapt IS to CFS. ULE is totally different than CFS. I remember that ULE is about 3k LOC only, where CFS is absolutely is more than 25k LOC (I counted only 4 files in CFS no all)
Here just 4 files LOC in linux I deal with daily
โฏ cat kernel/sched/fair.c | wc -l
11872
~/dev/linux/linux rdb
โฏ cat kernel/sched/core.c | wc -l
8498
~/dev/linux/linux rdb
โฏ cat kernel/sched/sched.h | wc -l
2643
~/dev/linux/linux rdb
โฏ cat include/linux/sched.h | wc -l
2075
Thank you
from cacule-cpu-scheduler.
Hi @owl4ce
The patch of 5.9 is fortunately working on 5.10
I am trying to fix the freezing problem and release a version with fixes. Could please try the patch in #20 (comment)
and let me know if it smoother and not mini-freezes under heavy load?
Thank you
from cacule-cpu-scheduler.
I can confirm this, happens during high CPU intensive tasks (xanmod-cacule 5.10 kernel, i3 2 core, 2 thread).
Example, shader precompile of steam games hangs my system (I saw 4 process fossilize_replay
, but all of them were using more than 25% of the cpu each, thus pushing the system to hang)
This doesn't occur for memory intensive tasks for me. Also, this wasn't an issue when it was still Cachy
:)
from cacule-cpu-scheduler.
I can confirm this, happens during high CPU intensive tasks (xanmod-cacule 5.10 kernel, i3 2 core, 2 thread).
Example, shader precompile of steam games hangs my system (I saw 4 processfossilize_replay
, but all of them were using more than 25% of the cpu each, thus pushing the system to hang)This doesn't occur for memory intensive tasks for me. Also, this wasn't an issue when it was still
Cachy
:)
I also experienced same thing.
from cacule-cpu-scheduler.
I can confirm this, happens during high CPU intensive tasks (xanmod-cacule 5.10 kernel, i3 2 core, 2 thread).
Example, shader precompile of steam games hangs my system (I saw 4 processfossilize_replay
, but all of them were using more than 25% of the cpu each, thus pushing the system to hang)This doesn't occur for memory intensive tasks for me. Also, this wasn't an issue when it was still
Cachy
:)
This was fixed (hopefully) with the last commit. If you compile from source can you try to patch it with this patch #15 (comment)
Thank you
from cacule-cpu-scheduler.
I can confirm this, happens during high CPU intensive tasks (xanmod-cacule 5.10 kernel, i3 2 core, 2 thread).
Example, shader precompile of steam games hangs my system (I saw 4 processfossilize_replay
, but all of them were using more than 25% of the cpu each, thus pushing the system to hang)
This doesn't occur for memory intensive tasks for me. Also, this wasn't an issue when it was stillCachy
:)I also experienced same thing.
Hi @owl4ce
Could you please double check that this patch #15 (comment) is applied?
You can confirm bycat kernel/sched/fair.c | grep -B2 -A2 "p->se.exec_start = 0;"
Output
#if !defined(CONFIG_CACULE_SCHED) /* We have migrated, no longer consider this task hot */ p->se.exec_start = 0; #endif
Thank you
Yes, I patched it. I am not sure about the problem, when the cpu usage is high and I play the song also sometimes it comes back a few seconds then comes back again. Feels very heavy when all cores are used at 100% usage.
Can you please try the latest Cacule patch on mainline kernel v5.9 (without xanmod).
Sorry about asking too much compiling, the problem could be not related to cacule.
Thank you
from cacule-cpu-scheduler.
Hi everyone.
I can confirm that as said on this can end this issue.
By disabling this in kernel configuration (menuconfig)
-
SCHED_AUTOGROUP
-
FAIR_GROUP_SCHED
Yes, it worked for me. Everything is now smooth, even when compiling programs until all cores are 100% usage.
xanmod 5.10.3 cacule
from cacule-cpu-scheduler.
Can you please try this patch. I fixed the interactivity score equation. It is now very similar to Cachy/HRRN
from cacule-cpu-scheduler.
Hi @hamadmarri
I think the
cacule_max_lifetime
in the new patch should be better tunable via sysctl. Also it doesn't seem to include the starvation fix.
Hi @raykzhao
The starvation fix actually is not a good solution since a task can keep migrating even if it is cache hot. I think the fix in interactivity score is better approach. Also I was mistaken about vruntime getting reset, only exec_start got reset on migration, which gets updated right away in set_next_task func.
Thank you
from cacule-cpu-scheduler.
@hamadmarri @raykzhao sorry about the delayed reply. I tested the latest patch, and it fixed my issue completely. Also, I tested on a freshly reinstalled system, so all the values are set to default.
I only faced some minor audio lags (very minimal) during recording, but other than that, everything was butter smooth.
Thank you so much for your support, the issue is completely fixed for me bow
Wish you all a very Happy New Year beers
Glad to hear it is working good now.
Happy new year @Salekin-1169 .
from cacule-cpu-scheduler.
I learned so much about how ULE works from this article and slide (Can't find an appropriate place to share, so posting here temporarily).
ULE by design will cause starvation, but will yield better throughput it seems? I had completely opposite idea about it.
The research paper: https://www.usenix.org/system/files/conference/atc18/atc18-bouron.pdf
Section: 3 Porting ULE to the Linux kernel
This study unfortunately is not fair, it is an implementation of ULE on top of CFS. They replaced some functions in CFS with ULE way of their implementation. I don't call this a fair comparison. Their conclusion based on the stats and results they had from this unfair implementation in which they conclude that ULE could lead to starvation.
Yes, I think CFS is more advance than ULE, but I am not really convinced that ULE lead to starvation based on a single unfair study. I have tried FreeBSD, it is smoother and haven't faced any kind of starvation while stressing the system with many kind of tests. ULE code is cleaner, 10x less than CFS, and probably faster, however, magically, CFS provides slightly higher throughput than ULE.
Thank you
EDIT:
You can see Table1 on the paper of implemented functions. I don't think they have implemented ULE balancer too.
from cacule-cpu-scheduler.
I can confirm that setting kernel.sched_interactivity_factor = 50 fixed this issue.
from cacule-cpu-scheduler.
Hi @Pilleo
Can i do smth to figure out was it cacule problem or not?
Please try sudo journalctl -b -1
. the -1
is the number of previous boot. Please figure out witch boot the problem happened. You can try -2, -3, -4 ... until you find the logs when that problem happened (you can check the time on top)
If you found the boot then redirect the journal output to a file and upload here please
sudo journalctl -b -1 > log.txt
Could you please confirm that this issue doesn't happen with mainline kernel or with xanmod without cacule?
If this problem is only happening with cacule, please try this fix patch on top of 5.10.1 xanmod/cacule source.
patch2.zip
Please let me know if it fixes the issue.
Thanks
from cacule-cpu-scheduler.
Another thing that please double check your swap size. Your RAM might got filled with no enough swap area.
Also check the vm.swapness <--- if you haven't change it then don't worry about it (the default value is ok).
from cacule-cpu-scheduler.
Let me leave the topic.
I want to ask, when will the latest version of CacULE or Cachy for kernel v5.10 be released?
EDIT: I mean patchfile
My Distro: Gentoo/Linux (compile from source code)
from cacule-cpu-scheduler.
I cannot confirm or refute that it was only with cacule. I have 16 gb ram and 1 gb swap with swappiness 1. And if I am developing Java, than I am always short on memory. Earlier I had frequent freezes, so I installed EarlyOOM. It worked just fine so far. Maybe this time it could not work for some reason, maybe it is smth else. But I did not have that particular problem without cacule with EarlyOOM present.
log.txt
I was trying to use Magic SysRq key, but think kernel could not respond to it
Hope it will help.
from cacule-cpu-scheduler.
Hi @Pilleo
Isn't there a kernel bug at the end of the log? It seems to have something to do with SLUB/usercopy.
from cacule-cpu-scheduler.
@hamadmarri , @raykzhao
Does it mean it is not a cacule problem? Should we report somewhere else?
from cacule-cpu-scheduler.
@hamadmarri , @raykzhao
Does it mean it is not a cacule problem? Should we report somewhere else?
Hi @Pilleo
I don't think it's a cacule issue. I think it is related to the swap since SLUB is a memory allocator. It could be SLUB is not able to allocate more memory. I am sure that cacule has nothing to do with slub or memory issues.
You may want to report to earlyoom, or your distro forum?
Thank you
from cacule-cpu-scheduler.
Is it possible that the problem is n kernel 5.10? In my understanding it should have at least react at magic key, but it was a complete crash
from cacule-cpu-scheduler.
Is it possible that the problem is n kernel 5.10? In my understanding it should have at least react at magic key, but it was a complete crash
Yes, it could be.
from cacule-cpu-scheduler.
I can confirm this, happens during high CPU intensive tasks (xanmod-cacule 5.10 kernel, i3 2 core, 2 thread).
Example, shader precompile of steam games hangs my system (I saw 4 processfossilize_replay
, but all of them were using more than 25% of the cpu each, thus pushing the system to hang)
This doesn't occur for memory intensive tasks for me. Also, this wasn't an issue when it was stillCachy
:)I also experienced same thing.
Hi @owl4ce
Could you please double check that this patch #15 (comment) is applied?
You can confirm by
cat kernel/sched/fair.c | grep -B2 -A2 "p->se.exec_start = 0;"
Output
#if !defined(CONFIG_CACULE_SCHED)
/* We have migrated, no longer consider this task hot */
p->se.exec_start = 0;
#endif
Thank you
from cacule-cpu-scheduler.
I can confirm this, happens during high CPU intensive tasks (xanmod-cacule 5.10 kernel, i3 2 core, 2 thread).
Example, shader precompile of steam games hangs my system (I saw 4 processfossilize_replay
, but all of them were using more than 25% of the cpu each, thus pushing the system to hang)
This doesn't occur for memory intensive tasks for me. Also, this wasn't an issue when it was stillCachy
:)I also experienced same thing.
Hi @owl4ce
Could you please double check that this patch #15 (comment) is applied?
You can confirm by
cat kernel/sched/fair.c | grep -B2 -A2 "p->se.exec_start = 0;"
Output
#if !defined(CONFIG_CACULE_SCHED) /* We have migrated, no longer consider this task hot */ p->se.exec_start = 0; #endif
Thank you
Yes, I patched it. I am not sure about the problem, when the cpu usage is high and I play the song also sometimes it comes back a few seconds then comes back again. Feels very heavy when all cores are used at 100% usage.
from cacule-cpu-scheduler.
I believe there is an issue in v5.10 please see xanmod/linux#111
It could be a bug in mainline kernel v5.10 since the guy how posted the issue in xanmod is not using cacule. I don't think the issue is in xanmod since there is not a big change on xanmod patches from v5.9 to v5.10
Can anyone confirm the freezes is also in v5.9 using the latest cacule patch?
Thank you
from cacule-cpu-scheduler.
@hamadmarri @raykzhao @owl4ce sorry for the late reply. I updated the latest xanmod-cacule package and tested.
Running steam shader precompile with youtube running in background.
my system didn't come to a complete hang like before, but it was still unusable (audio skips in youtube, screen lags)
After that, I changed the value of kernel.sched_interactivity_factor
to 50
and the system immediately became responsive.
from cacule-cpu-scheduler.
I can even record what's happening currently on my pc now, which is impossible with the default value of kernel.sched_interactivity_factor
from cacule-cpu-scheduler.
@hamadmarri @raykzhao @owl4ce sorry for the late reply. I updated the latest xanmod-cacule package and tested.
Running steam shader precompile with youtube running in background.
my system didn't come to a complete hang like before, but it was still unusable (audio skips in youtube, screen lags)After that, I changed the value of
kernel.sched_interactivity_factor
to50
and the system immediately became responsive.
What was the default value? 32768? Or 10?
from cacule-cpu-scheduler.
@hamadmarri 32768
from cacule-cpu-scheduler.
@hamadmarri I have a comparatively weak cpu (i3 2 physical + 2 logical cores), so I think the default 32768
isn't suitable for my pc, while other people didn't face issue with this.
from cacule-cpu-scheduler.
Hi @hamadmarri
I think the cacule_max_lifetime
in the new patch should be better tunable via sysctl. Also it doesn't seem to include the starvation fix.
from cacule-cpu-scheduler.
Can you please try this patch. I fixed the interactivity score equation. It is now very similar to Cachy/HRRN
@hamadmarri Thank you for your quick fix ๐
I apologize, because I'm still relatively new to linux and don't know how to compile custom kernel yet. I'll study about it this weekend and check if the latest patch fixes the issue, then let you know ๐
from cacule-cpu-scheduler.
Hi @hamadmarri
Although the new fix seems better than the vanilla CacULE on my machine, unfortunately it doesn't feel better than the original Cachy or the CacULE with either the smoother or the starvation fix. Somehow it's slightly more glitchy on my machine under heavy load.
I still think both Cachy and CacULE should be kept in a single patch (#19), since it's difficult to find a scheduling policy that is suitable for everyone. On my machine the original Cachy is definitely the winner, but it may not be the case for others.
Hi @raykzhao
What I am worried about is that the problem is not on cachy nor on cacule. Maybe it's on v5.9 and v5.10. Could please confirm that cachy has no issues on both v5.9 and v5.10?
Thank you
from cacule-cpu-scheduler.
Hi @hamadmarri
I can confirm that there is no difference on my machine with Cachy scheduler between 5.9 and 5.10 kernels.
from cacule-cpu-scheduler.
I just found that the interactivity score fix should already be merged to xanmod/linux@16e99a8 and therefore 5.10.4-xanmod1-cacule should include the fix. You don't need to build your own custom kernel now.
from cacule-cpu-scheduler.
I learned so much about how ULE works from this article and slide (Can't find an appropriate place to share, so posting here temporarily).
ULE by design will cause starvation, but will yield better throughput it seems? I had completely opposite idea about it.
from cacule-cpu-scheduler.
@hamadmarri thank you so much for your explanation. I'm still learning about it. CFS has about a decade of optimizations behind it, so for that it's throughput is higher I guess ?
Just curious, how different is Cacule from the FreeBSD implementation of ULE? Can I read up about it somewhere?
from cacule-cpu-scheduler.
Please let me know if the new cachy-r9 (with rdb balancer) is better than both cachy-r8/cacule
https://github.com/hamadmarri/cacule-cpu-scheduler/blob/master/patches/Cachy/v5.9/cachy-5.9-r9.patch
from cacule-cpu-scheduler.
Related Issues (20)
- Recommended config settings for Cachy HOT 6
- Sound interrupts during background operations HOT 36
- patch 5.9-r8 HOT 6
- interactivity_factor is not tunable via sysctl HOT 1
- Compiling error when CONFIG_SCHED_DEBUG is enabled HOT 3
- Feature Request: Use kconfig menu to select between Cachy/CacULE HOT 3
- Public Chat HOT 88
- GRQ - Error when CONFIG_PREEMPT = y
- Hi, if running this program can it damage my CPU? HOT 2
- Error when compiling HOT 15
- Warning at kernel/sched/core.c:4637 after booting HOT 7
- sched_yield tweaks HOT 31
- RDB schedutil support HOT 31
- possible for sysctl variables calculated dynamically? HOT 2
- compile error, updated patches
- 5.14-rc support for testing HOT 2
- rdb patch HOT 1
- Experiencing some random hangs under heavy workload HOT 67
- New issue with rdb.patch + 5.13.13-xanmod1-cacule HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cacule-cpu-scheduler.