Coder Social home page Coder Social logo

lklfuse hang with stress-ng about linux HOT 7 CLOSED

lkl avatar lkl commented on September 28, 2024
lklfuse hang with stress-ng

from linux.

Comments (7)

pscollins avatar pscollins commented on September 28, 2024

@tavip @opurdila How did you cause this, exactly? I suspect it comes from the same virtio issue as #64, which I've been trying to track down for the past few days --- it would be nice to have an additional testcase. Is this supposed to be run under lkl-hijack.sh? I get:

Unknown class: 'filesystem', available classes: cpu cpu-cache io interrupt memory network os scheduler vm

when I try to run it like that. I'm not sure if I've got the wrong version installed, etc.

from linux.

 avatar commented on September 28, 2024

IIRC you need a higher version, I've compiled the latest one instead of using the one installed on the system.

from linux.

Rondom avatar Rondom commented on September 28, 2024

This seems to occur when there is a moderate IO load. I can reproduce this both with stress-ng and with filebench. For example running the webserver workload without any changes, will trigger it. Reducing the number of threads to 3 will make it disappear, with 4 threads I can reliably reproduce it.

Or even by running the snippet below in shell and having the folder open in Nautilus (GNOME file manager) at the same time. Seems like only Nautilus creates enough IO to trigger it.

 while true; do; mkdir -p $MOUNT_POINT/foo; echo hi > $MOUNT_POINT/foo/bar; rm -rf $MOUNT_POINT/foo; done

What I see in the logs is that the number of IRQ skyrockets until all 64 IRQs are exhausted (lkl: syscall_thread: failed to allocate irq: -16). In between we get a few "syscall would block" messages.

So I am wondering what is the cause of this and what is happening. What is the expected behaviour when runnning out of IRQs (block, fail)? Why do we get those "syscall would block" messages?

from linux.

liuyuan10 avatar liuyuan10 commented on September 28, 2024

I don't use lklfuse so I can't comment on your application. The logs you showed is helpful:
"syscall would block" only happens when multiple host threads ask the default syscall thread to create a new syscall thread. It doesn't mean a error.

The "failed to allocate" error log might be the reason. As described in issue #147 , if you have ~64 host threads calling lkl syscalls, the error will happen.

To mitigate this, there are several ways in my mind:

  1. call lkl_stop_syscall_thread() to free the irq. However, this won't help if you have 64 host threads calling lkl syscalls concurrently
  2. Try PR #160 . It doesn't require a irq for syscalls any more so this issue is solved. The PR may not be merged any time soon but it's pretty stable on network application.
  3. setting auto_syscall_threads to false in arch/lkl/kernel/syscalls.c. Don't know if it still works. And it can cause severe performance issue because one blocking syscall will block all other threads and it causes deadlock easily.

from linux.

Rondom avatar Rondom commented on September 28, 2024

Thanks for your answer @liuyuan10!
I had already been following your pull request and it was the first thing I wanted to try the next morning. Indeed it fixes the issue.

The problem is that libfuse will call pthread_detach once it has reached more than 10 threads while creating new threads as needed. Right now there is no easy way to hook into that code without patching libfuse, as far as I can see. On could always write one's own FUSE loop function, though.

So indeed this is caused by #147, but we might still leave this open until we find a workaround in lklfuse.

from linux.

liuyuan10 avatar liuyuan10 commented on September 28, 2024

You are welcome. Would like to know if the patch works in your case. I need more test cases on that patch.

from linux.

Rondom avatar Rondom commented on September 28, 2024

See #192 for possible automatic cleanup making calling lkl_stop_syscall_thread() unnecessary.

from linux.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.