Coder Social home page Coder Social logo

Comments (44)

raykzhao avatar raykzhao commented on July 20, 2024 3

Hi @hamadmarri @Salekin-1169 @owl4ce

I just checked the latest xanmod-cacule setting. It doesn't include the starvation fix, and SCHED_AUTOGROUP (and therefore FAIR_GROUP_SCHED) are enabled. Also it includes the following scheduler tweaks and uses the non-standard 500Hz timer (no such an option in mainline):

sysctl_sched_nr_migrate = 256
sysctl_sched_rt_runtime = 980000

I just opened an issue at xanmod/linux#112 and mentioned the starvation fix and disabling FAIR_GROUP_SCHED.

from cacule-cpu-scheduler.

raykzhao avatar raykzhao commented on July 20, 2024 3

Hi @Salekin-1169

I'm not sure which Linux distro you are using. Generally speaking, you may try:

  1. Download the latest kernel source from https://www.kernel.org/
  2. Copy the patch file to the kernel source folder, and run patch -p1 -i interactivity_score_fix.patch
  3. Get your current kernel config file. Usually it should be located at /proc/config.gz. If not, try to run modprobe configs first.
  4. In the kernel source folder, run zcat /proc/config.gz > .config
  5. Run make menuconfig and make sure you disable the following:
General Setup-->Automatic process group scheduling
General Setup-->Control Group support-->CPU controller-->Group scheduling for SCHED_OTHER

You may also want to distinguish with the existing kernel by appending some suffix at General Setup-->Local version - append to kernel release.
6. Run make -jx, where x is the number of CPU cores. For example, if you have a 4-core CPU, you may run make -j4.
7. Run make install and make modules_install.
8. Create initramfs and update your bootloader configuration. Based on your distro, the instructions will be different.
9. If you are using any out-of-tree modules e.g. nvidia proprietary drivers, you also need to recompile or reinstall them. Depend on your distro, the instructions will be different.

For the initramfs, bootloader, and out-of-tree modules, you should lookup the instructions of your specific Linux distro. Some distros may also have a guide on how to build custom kernels.

from cacule-cpu-scheduler.

Salekin-1169 avatar Salekin-1169 commented on July 20, 2024 3

@hamadmarri @raykzhao sorry about the delayed reply. I tested the latest patch, and it fixed my issue completely. Also, I tested on a freshly reinstalled system, so all the values are set to default.
sched_interactivity

I only faced some minor audio lags (very minimal) during recording, but other than that, everything was butter smooth.
Peek 2021-01-01 22-53

Thank you so much for your support, the issue is completely fixed for me ๐Ÿ™‡
Wish you all a very Happy New Year ๐Ÿป

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024 2

This solution is suggested by Alexandre Frade
Thanks to him

Hamad When executing the nvidia-dkms and mkinitramfs triggers, started the freezes reported by users, even with the fix "remove start_exec = 0", the system normalized without the autogroup:
echo 0 |sudo tee /proc/sys/kernel/sched_autogroup_enabled
all users with this problem, try this solution

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024 2

@hamadmarri I have a comparatively weak cpu (i3 2 physical + 2 logical cores), so I think the default 32768 isn't suitable for my pc, while other people didn't face issue with this.

Hi @Salekin-1169

+ score_se = (u64_factor / (vr_se / sleep_se)) + u64_factor;

I believe the problem is in this line. It seems that a task that has high run time will get lower value! I am not sure if original ULE used it this way. I will try one more fix. Also I will add reset_life_time as same as cachy has.

When a task flips to be non interactive where it should have a score somehow closer to it's previous score, but from the math I can see that the score jumps to the lowest and gradually starts to gain some score.

from cacule-cpu-scheduler.

raykzhao avatar raykzhao commented on July 20, 2024 2

Hi @hamadmarri

Although the new fix seems better than the vanilla CacULE on my machine, unfortunately it doesn't feel better than the original Cachy or the CacULE with either the smoother or the starvation fix. Somehow it's slightly more glitchy on my machine under heavy load.

I still think both Cachy and CacULE should be kept in a single patch (#19), since it's difficult to find a scheduling policy that is suitable for everyone. On my machine the original Cachy is definitely the winner, but it may not be the case for others.

from cacule-cpu-scheduler.

owl4ce avatar owl4ce commented on July 20, 2024 2

Same, but on my machine Cachy is more responsive but not under heavy load, CacULE is much more responsive if my machine is running heavy duty under 100% pressure of all cores. In conclusion, on my machine CacULE can lighten the load when all cores are 100%, I mean that it can still move freely.

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024 2

@hamadmarri thank you so much for your explanation. I'm still learning about it. CFS has about a decade of optimizations behind it, so for that it's throughput is higher I guess ?

Just curious, how different is Cacule from the FreeBSD implementation of ULE? Can I read up about it somewhere?

CFS is a result of previous crazy approaches. One of them was the scheduler made by the genius guy called Con Kolivas (the author of bfs and muqss) - his scheduler then was Staircase scheduler. Ingo Molnรกr made CFS which is inspired by Kolivas's work. FreeBSD didn't have a crazy guy like Kolivas.

Just curious, how different is Cacule from the FreeBSD implementation of ULE? Can I read up about it somewhere?

https://github.com/hamadmarri/cacule-cpu-scheduler#the-cacule-interactivity-score

I have only implemented the interactivity (IS) score (see Figure 1 https://github.com/hamadmarri/cacule-cpu-scheduler/blob/master/helper%20docs%20for%20kernel%20dev/FreeBSD/ULE.pdf)

My implementation of IS is not similar to ULE, it is more like the Cachy/HRRN way.
ULE uses 2 runqueues, and based on the IS math, the task is placed to some runqueue. They also use multiple levels queues for priority. Where I just use the CFS's vruntime calculations which affect the task priority on vruntime value. Which therefore affects the total run time of the process when calculating HRRN or IS

I used a shortcut approach to adapt IS to CFS. ULE is totally different than CFS. I remember that ULE is about 3k LOC only, where CFS is absolutely is more than 25k LOC (I counted only 4 files in CFS no all)

Here just 4 files LOC in linux I deal with daily

โฏ cat kernel/sched/fair.c | wc -l
11872

~/dev/linux/linux rdb
โฏ cat kernel/sched/core.c | wc -l
8498

~/dev/linux/linux rdb
โฏ cat kernel/sched/sched.h | wc -l
2643

~/dev/linux/linux rdb
โฏ cat include/linux/sched.h | wc -l
2075

Thank you

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024 1

Hi @owl4ce

The patch of 5.9 is fortunately working on 5.10

I am trying to fix the freezing problem and release a version with fixes. Could please try the patch in #20 (comment)
and let me know if it smoother and not mini-freezes under heavy load?

Thank you

from cacule-cpu-scheduler.

Salekin-1169 avatar Salekin-1169 commented on July 20, 2024 1

I can confirm this, happens during high CPU intensive tasks (xanmod-cacule 5.10 kernel, i3 2 core, 2 thread).
Example, shader precompile of steam games hangs my system (I saw 4 process fossilize_replay, but all of them were using more than 25% of the cpu each, thus pushing the system to hang)

This doesn't occur for memory intensive tasks for me. Also, this wasn't an issue when it was still Cachy :)

from cacule-cpu-scheduler.

owl4ce avatar owl4ce commented on July 20, 2024 1

I can confirm this, happens during high CPU intensive tasks (xanmod-cacule 5.10 kernel, i3 2 core, 2 thread).
Example, shader precompile of steam games hangs my system (I saw 4 process fossilize_replay, but all of them were using more than 25% of the cpu each, thus pushing the system to hang)

This doesn't occur for memory intensive tasks for me. Also, this wasn't an issue when it was still Cachy :)

I also experienced same thing.

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024 1

I can confirm this, happens during high CPU intensive tasks (xanmod-cacule 5.10 kernel, i3 2 core, 2 thread).
Example, shader precompile of steam games hangs my system (I saw 4 process fossilize_replay, but all of them were using more than 25% of the cpu each, thus pushing the system to hang)

This doesn't occur for memory intensive tasks for me. Also, this wasn't an issue when it was still Cachy :)

Hi @Salekin-1169

This was fixed (hopefully) with the last commit. If you compile from source can you try to patch it with this patch #15 (comment)

Thank you

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024 1

I can confirm this, happens during high CPU intensive tasks (xanmod-cacule 5.10 kernel, i3 2 core, 2 thread).
Example, shader precompile of steam games hangs my system (I saw 4 process fossilize_replay, but all of them were using more than 25% of the cpu each, thus pushing the system to hang)
This doesn't occur for memory intensive tasks for me. Also, this wasn't an issue when it was still Cachy :)

I also experienced same thing.

Hi @owl4ce
Could you please double check that this patch #15 (comment) is applied?
You can confirm by

cat kernel/sched/fair.c | grep -B2 -A2 "p->se.exec_start = 0;"

Output

#if !defined(CONFIG_CACULE_SCHED)
        /* We have migrated, no longer consider this task hot */
        p->se.exec_start = 0;
#endif

Thank you

Yes, I patched it. I am not sure about the problem, when the cpu usage is high and I play the song also sometimes it comes back a few seconds then comes back again. Feels very heavy when all cores are used at 100% usage.
2020-12-28-200229_609x351_scrot

Can you please try the latest Cacule patch on mainline kernel v5.9 (without xanmod).
Sorry about asking too much compiling, the problem could be not related to cacule.

Thank you

from cacule-cpu-scheduler.

owl4ce avatar owl4ce commented on July 20, 2024 1

Hi everyone.
I can confirm that as said on this can end this issue.

By disabling this in kernel configuration (menuconfig)

  • SCHED_AUTOGROUP
  • FAIR_GROUP_SCHED

Yes, it worked for me. Everything is now smooth, even when compiling programs until all cores are 100% usage.
xanmod 5.10.3 cacule

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024 1

interactivity_score_fix.zip

Can you please try this patch. I fixed the interactivity score equation. It is now very similar to Cachy/HRRN

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024 1

Hi @hamadmarri

I think the cacule_max_lifetime in the new patch should be better tunable via sysctl. Also it doesn't seem to include the starvation fix.

Hi @raykzhao

The starvation fix actually is not a good solution since a task can keep migrating even if it is cache hot. I think the fix in interactivity score is better approach. Also I was mistaken about vruntime getting reset, only exec_start got reset on migration, which gets updated right away in set_next_task func.

Thank you

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024 1

@hamadmarri @raykzhao sorry about the delayed reply. I tested the latest patch, and it fixed my issue completely. Also, I tested on a freshly reinstalled system, so all the values are set to default.
sched_interactivity

I only faced some minor audio lags (very minimal) during recording, but other than that, everything was butter smooth.
Peek 2021-01-01 22-53

Thank you so much for your support, the issue is completely fixed for me bow
Wish you all a very Happy New Year beers

Glad to hear it is working good now.
Happy new year @Salekin-1169 .

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024 1

I learned so much about how ULE works from this article and slide (Can't find an appropriate place to share, so posting here temporarily).

ULE by design will cause starvation, but will yield better throughput it seems? I had completely opposite idea about it.

Hi @Salekin-1169

The research paper: https://www.usenix.org/system/files/conference/atc18/atc18-bouron.pdf
Section: 3 Porting ULE to the Linux kernel

This study unfortunately is not fair, it is an implementation of ULE on top of CFS. They replaced some functions in CFS with ULE way of their implementation. I don't call this a fair comparison. Their conclusion based on the stats and results they had from this unfair implementation in which they conclude that ULE could lead to starvation.

Yes, I think CFS is more advance than ULE, but I am not really convinced that ULE lead to starvation based on a single unfair study. I have tried FreeBSD, it is smoother and haven't faced any kind of starvation while stressing the system with many kind of tests. ULE code is cleaner, 10x less than CFS, and probably faster, however, magically, CFS provides slightly higher throughput than ULE.

Thank you

EDIT:
You can see Table1 on the paper of implemented functions. I don't think they have implemented ULE balancer too.

from cacule-cpu-scheduler.

Vistaus avatar Vistaus commented on July 20, 2024 1

I can confirm that setting kernel.sched_interactivity_factor = 50 fixed this issue.

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024

Hi @Pilleo

Can i do smth to figure out was it cacule problem or not?

Please try sudo journalctl -b -1. the -1 is the number of previous boot. Please figure out witch boot the problem happened. You can try -2, -3, -4 ... until you find the logs when that problem happened (you can check the time on top)

If you found the boot then redirect the journal output to a file and upload here please
sudo journalctl -b -1 > log.txt

Could you please confirm that this issue doesn't happen with mainline kernel or with xanmod without cacule?

If this problem is only happening with cacule, please try this fix patch on top of 5.10.1 xanmod/cacule source.
patch2.zip

Please let me know if it fixes the issue.

Thanks

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024

Another thing that please double check your swap size. Your RAM might got filled with no enough swap area.
Also check the vm.swapness <--- if you haven't change it then don't worry about it (the default value is ok).

from cacule-cpu-scheduler.

owl4ce avatar owl4ce commented on July 20, 2024

Let me leave the topic.
I want to ask, when will the latest version of CacULE or Cachy for kernel v5.10 be released?
EDIT: I mean patchfile

My Distro: Gentoo/Linux (compile from source code)

from cacule-cpu-scheduler.

Pilleo avatar Pilleo commented on July 20, 2024

I cannot confirm or refute that it was only with cacule. I have 16 gb ram and 1 gb swap with swappiness 1. And if I am developing Java, than I am always short on memory. Earlier I had frequent freezes, so I installed EarlyOOM. It worked just fine so far. Maybe this time it could not work for some reason, maybe it is smth else. But I did not have that particular problem without cacule with EarlyOOM present.
log.txt
I was trying to use Magic SysRq key, but think kernel could not respond to it
Hope it will help.

from cacule-cpu-scheduler.

raykzhao avatar raykzhao commented on July 20, 2024

Hi @Pilleo

Isn't there a kernel bug at the end of the log? It seems to have something to do with SLUB/usercopy.

from cacule-cpu-scheduler.

Pilleo avatar Pilleo commented on July 20, 2024

@hamadmarri , @raykzhao
Does it mean it is not a cacule problem? Should we report somewhere else?

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024

@hamadmarri , @raykzhao
Does it mean it is not a cacule problem? Should we report somewhere else?

Hi @Pilleo

I don't think it's a cacule issue. I think it is related to the swap since SLUB is a memory allocator. It could be SLUB is not able to allocate more memory. I am sure that cacule has nothing to do with slub or memory issues.

You may want to report to earlyoom, or your distro forum?

Thank you

from cacule-cpu-scheduler.

Pilleo avatar Pilleo commented on July 20, 2024

Is it possible that the problem is n kernel 5.10? In my understanding it should have at least react at magic key, but it was a complete crash

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024

Is it possible that the problem is n kernel 5.10? In my understanding it should have at least react at magic key, but it was a complete crash

Yes, it could be.

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024

I can confirm this, happens during high CPU intensive tasks (xanmod-cacule 5.10 kernel, i3 2 core, 2 thread).
Example, shader precompile of steam games hangs my system (I saw 4 process fossilize_replay, but all of them were using more than 25% of the cpu each, thus pushing the system to hang)
This doesn't occur for memory intensive tasks for me. Also, this wasn't an issue when it was still Cachy :)

I also experienced same thing.

Hi @owl4ce

Could you please double check that this patch #15 (comment) is applied?

You can confirm by

cat kernel/sched/fair.c | grep -B2 -A2 "p->se.exec_start = 0;"

Output

#if !defined(CONFIG_CACULE_SCHED)
        /* We have migrated, no longer consider this task hot */
        p->se.exec_start = 0;
#endif

Thank you

from cacule-cpu-scheduler.

owl4ce avatar owl4ce commented on July 20, 2024

I can confirm this, happens during high CPU intensive tasks (xanmod-cacule 5.10 kernel, i3 2 core, 2 thread).
Example, shader precompile of steam games hangs my system (I saw 4 process fossilize_replay, but all of them were using more than 25% of the cpu each, thus pushing the system to hang)
This doesn't occur for memory intensive tasks for me. Also, this wasn't an issue when it was still Cachy :)

I also experienced same thing.

Hi @owl4ce

Could you please double check that this patch #15 (comment) is applied?

You can confirm by

cat kernel/sched/fair.c | grep -B2 -A2 "p->se.exec_start = 0;"

Output

#if !defined(CONFIG_CACULE_SCHED)
        /* We have migrated, no longer consider this task hot */
        p->se.exec_start = 0;
#endif

Thank you

Yes, I patched it. I am not sure about the problem, when the cpu usage is high and I play the song also sometimes it comes back a few seconds then comes back again. Feels very heavy when all cores are used at 100% usage.
2020-12-28-200229_609x351_scrot

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024

I believe there is an issue in v5.10 please see xanmod/linux#111

It could be a bug in mainline kernel v5.10 since the guy how posted the issue in xanmod is not using cacule. I don't think the issue is in xanmod since there is not a big change on xanmod patches from v5.9 to v5.10

Can anyone confirm the freezes is also in v5.9 using the latest cacule patch?

Thank you

from cacule-cpu-scheduler.

Salekin-1169 avatar Salekin-1169 commented on July 20, 2024

@hamadmarri @raykzhao @owl4ce sorry for the late reply. I updated the latest xanmod-cacule package and tested.
Running steam shader precompile with youtube running in background.
my system didn't come to a complete hang like before, but it was still unusable (audio skips in youtube, screen lags)

After that, I changed the value of kernel.sched_interactivity_factor to 50 and the system immediately became responsive.

from cacule-cpu-scheduler.

Salekin-1169 avatar Salekin-1169 commented on July 20, 2024

I can even record what's happening currently on my pc now, which is impossible with the default value of kernel.sched_interactivity_factor
Peek 2020-12-29 22-13

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024

@hamadmarri @raykzhao @owl4ce sorry for the late reply. I updated the latest xanmod-cacule package and tested.
Running steam shader precompile with youtube running in background.
my system didn't come to a complete hang like before, but it was still unusable (audio skips in youtube, screen lags)

After that, I changed the value of kernel.sched_interactivity_factor to 50 and the system immediately became responsive.

What was the default value? 32768? Or 10?

from cacule-cpu-scheduler.

Salekin-1169 avatar Salekin-1169 commented on July 20, 2024

@hamadmarri 32768

from cacule-cpu-scheduler.

Salekin-1169 avatar Salekin-1169 commented on July 20, 2024

@hamadmarri I have a comparatively weak cpu (i3 2 physical + 2 logical cores), so I think the default 32768 isn't suitable for my pc, while other people didn't face issue with this.

from cacule-cpu-scheduler.

raykzhao avatar raykzhao commented on July 20, 2024

Hi @hamadmarri

I think the cacule_max_lifetime in the new patch should be better tunable via sysctl. Also it doesn't seem to include the starvation fix.

from cacule-cpu-scheduler.

Salekin-1169 avatar Salekin-1169 commented on July 20, 2024

interactivity_score_fix.zip

Can you please try this patch. I fixed the interactivity score equation. It is now very similar to Cachy/HRRN

@hamadmarri Thank you for your quick fix ๐Ÿ™‡

I apologize, because I'm still relatively new to linux and don't know how to compile custom kernel yet. I'll study about it this weekend and check if the latest patch fixes the issue, then let you know ๐Ÿ––

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024

Hi @hamadmarri

Although the new fix seems better than the vanilla CacULE on my machine, unfortunately it doesn't feel better than the original Cachy or the CacULE with either the smoother or the starvation fix. Somehow it's slightly more glitchy on my machine under heavy load.

I still think both Cachy and CacULE should be kept in a single patch (#19), since it's difficult to find a scheduling policy that is suitable for everyone. On my machine the original Cachy is definitely the winner, but it may not be the case for others.

Hi @raykzhao

What I am worried about is that the problem is not on cachy nor on cacule. Maybe it's on v5.9 and v5.10. Could please confirm that cachy has no issues on both v5.9 and v5.10?

Thank you

from cacule-cpu-scheduler.

raykzhao avatar raykzhao commented on July 20, 2024

Hi @hamadmarri

I can confirm that there is no difference on my machine with Cachy scheduler between 5.9 and 5.10 kernels.

from cacule-cpu-scheduler.

raykzhao avatar raykzhao commented on July 20, 2024

Hi @Salekin-1169

I just found that the interactivity score fix should already be merged to xanmod/linux@16e99a8 and therefore 5.10.4-xanmod1-cacule should include the fix. You don't need to build your own custom kernel now.

from cacule-cpu-scheduler.

Salekin-1169 avatar Salekin-1169 commented on July 20, 2024

I learned so much about how ULE works from this article and slide (Can't find an appropriate place to share, so posting here temporarily).

ULE by design will cause starvation, but will yield better throughput it seems? I had completely opposite idea about it.

from cacule-cpu-scheduler.

Salekin-1169 avatar Salekin-1169 commented on July 20, 2024

@hamadmarri thank you so much for your explanation. I'm still learning about it. CFS has about a decade of optimizations behind it, so for that it's throughput is higher I guess ?

Just curious, how different is Cacule from the FreeBSD implementation of ULE? Can I read up about it somewhere?

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 20, 2024

Please let me know if the new cachy-r9 (with rdb balancer) is better than both cachy-r8/cacule

https://github.com/hamadmarri/cacule-cpu-scheduler/blob/master/patches/Cachy/v5.9/cachy-5.9-r9.patch

from cacule-cpu-scheduler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.