Coder Social home page Coder Social logo

Comments (6)

rigelbm avatar rigelbm commented on August 12, 2024 1

I encountered this problem and this is my working theory/understanding of what is happening. Thanks @rfjakob for the pointer.

Background: Go File IO

Go does file IO using epoll in Linux (or the equivalent in non-Linux targets). Go opens files in non-blocking mode and adds them to the runtime poller. The poller periodically polls IO events from the kernel for the registered files, and returns any goroutine waiting for IO as ready to wake up to the scheduler.

The problem: FUSE + Test == Deadlock

The problem here happens because the FUSE server and the test are running on the same process, and therefore sharing the same runtime poller. When the test makes a file IO that would cause a poll to a file in the file-system backed by the FUSE server, the test goroutine will sleep until the file is ready. Notice that at this point, the FUSE server is also sleeping, waiting for messages from the Kernel.

When the Kernel receives the poll, it passes it to FUSE (kernel) which forwards it to the FUSE server by writing it to the connection file. The Kernel then waits for a message back in the file, before returning the poll call back to the poller. Unfortunately the daemon never gets a chance to read the message from the kernel, because at this point the runtime poller is blocked waiting for the Kernel to answer the original poll call. It simply can't wake up the FUSE server goroutine, which means the original poll call can never be answered back to the Kernel. Deadlock.

Option 1: Don't poll from tests. (<--- This is what I'm doing)

The os package doesn't offer a straightforward way to do File IO operations in blocking mode (that I know of). It's possible to implement it yourself. For example:

func OpenNoPoll(name string, mode int, perm uint32) (*os.File, error) {
	fd, err := syscall.Open(name, mode|syscall.O_CLOEXEC, perm)
	if err != nil {
		return nil, err
	}
	return os.NewFile(uintptr(fd), name), nil
}

func ReadDirNoPoll(name string) ([]os.DirEntry, error) {
	f, err := OpenNoPoll(name, syscall.O_RDONLY, 0)
	if err != nil {
		return nil, err
	}
	defer func() { _ = f.Close() }()

	dirs, err := f.ReadDir(-1)
	if err != nil {
		return nil, err
	}

	sort.Slice(dirs, func(i, j int) bool { return dirs[i].Name() < dirs[j].Name() })
	return dirs, nil
}

It's messy, but it works.

Option 2: Run FUSE server and test in different processes.

Simple enough to do, but I find it not as convenient to debug.

Option 3: @rfjakob hack

The workaround posted by @rfjakob consists of: upon initialization, create a special file in the backed file-system, force a poll to that file, and then immediately return ENOSYS. This will tell the FUSE (kernel) that the backed file-system doesn't support poll, and following poll calls to the file-system will return immediately from the Kernel, without going to the FUSE server.

Although this hack is very effective, I believe it to be a bit of an overkill. This deadlock, as I understand it, only happens when the poll (to the backed file-system) originates from the same process as the FUSE server, which seems to be a very sketchy thing to do (except on a test).

from fuse.

jacobsa avatar jacobsa commented on August 12, 2024

I have no specific ideas or tips, but I will say (very generically, sorry) that with past issues like this I was able to get a lot of mileage out of digging around in /proc/$pid. For example, I think looking at the kernel stacks for the stuck threads would probably tell you something useful.

from fuse.

rfjakob avatar rfjakob commented on August 12, 2024

gocryptfs developer here, https://github.com/hanwen/go-fuse had a similar problem.

Does the hang reproduce every time when you run with GOMAXPROCS=1 ?

The workaround in go-fuse is this: https://github.com/hanwen/go-fuse/blob/master/fuse/poll.go

from fuse.

stapelberg avatar stapelberg commented on August 12, 2024

Thanks very much for the analysis!

I think I would actually prefer option 2 (server and test in separate processes), as it most closely mirrors how jacobsa/fuse will be run in practice.

from fuse.

kahing avatar kahing commented on August 12, 2024

the downside of option 2 is that the tests won't be able to mock out parts of the filesystem or do whitebox testing that examines the filesystem states

from fuse.

rigelbm avatar rigelbm commented on August 12, 2024

Update: I have since switched my project to run the file system in tests in a separate process. The reason is that I could never really remove all sources of poll, and changes to the system/test would randomly cause the tests to start freezing again. The downside is that debugging is more cumbersome, as noted above. Debug from IDE doesn't "simply work" anymore. Instead, I have to "connect to remote process", which is not as convenient. Nevertheless, it works. Below is a snippet of code that shows how run file system in a separate process in test.

func SetupTest() {
	daemon = exec.Command("path/to/daemon/binary")
	daemon.Stdout = os.Stdout
	stderr := newMatchingWriter(os.Stderr)
        // This assumes you have set the FUSE DebugLogger to STDERR. Change accordingly.
	daemon.Stderr = stderr
	daemon.Start()
        // we have to wait for the file system to actually mount before proceeding with the test, to avoid race conditions.
	stderr.Wait()
}


var matchNext = []byte("*initOK ()")

// matchingWriter is a io.Writer that checks whether a string matching the regular expression
// "init.*OK \(\)" has been written to it. All writes are forwarded to delegate.
type matchingWriter struct {
	delegate io.Writer
	state    int
	cond     *sync.Cond
	mu       *sync.Mutex
}

// newMatchingWriter returns a new matchingWriter that forwards writes to the given delegate.
func newMatchingWriter(delegate io.Writer) *matchingWriter {
	mu := &sync.Mutex{}
	return &matchingWriter{
		delegate: delegate,
		state:    0,
		cond:     sync.NewCond(mu),
		mu:       mu,
	}
}

func (writer *matchingWriter) Write(p []byte) (n int, err error) {
	writer.mu.Lock()
	for b := 0; b < len(p) && writer.state < len(matchNext)-1; b++ {
		if p[b] == matchNext[writer.state+1] {
			writer.state += 1
		} else if writer.state < 4 {
			writer.state = 0
		} else {
			writer.state = 4
		}
	}
	if writer.state == len(matchNext)-1 {
		writer.cond.Broadcast()
	}
	writer.mu.Unlock()
	return writer.delegate.Write(p)
}

// Wait until a string matching the regular expression "init.*OK \(\)" has been written to this
// matchingWriter.
func (writer *matchingWriter) Wait() {
	writer.mu.Lock()
	defer writer.mu.Unlock()
	for writer.state != len(matchNext)-1 {
		writer.cond.Wait()
	}
}

from fuse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.