I've had a situation where I ran watchTree with defau

GitHub didn't let me upload a SVG, so here is a gist: <a href="https://gist.github

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Likely O(n^2) behaviour for `watchTree`,about haskell-fswatch/hfsnotify

Comments (17)

thomasjm commented on June 26, 2024 3

Oh I see, I didn't notice the total time for the dirstream one. Makes sense, although without knowing more about why it's behaving that way I don't think we can rule out a performant pipes implementation being possible.

Anyway, it would be good to see your measurements on the hfsnotify functions but here's the way I'm thinking about this -- please let me know how it matches up with your priorities.

I think we could invest effort in optimizing the crucial section here, which seems to be the findDirs function. Figuring out which existence check functions are most performant seems prudent. But it seems the thing that would make the biggest change would be to change to using RawFilePath or similar. Which I think is a great idea, but it would be a bit of work to do it without changing the API of this library. (At some point RawFilePaths would need to be decoded into FilePaths and any errors or decoding discrepancies dealt with.)

On the other hand, it sounds like what you really need is a way to avoid running over gigantic directories like node_modules in the first place, right? It's always better to not create watches you don't need, since each watch comes with a kernel memory cost and brings you closer to hitting your inotify watch limit. The fact that you're getting perf problems from this directory traversal suggests that you're creating too many watches :P. Doing a filtered version of watchTree like @Janiczek mentioned would not be difficult.

from hfsnotify.

thomasjm commented on June 26, 2024 1

Thanks! Interesting how the biggest amount of time is actually in the getDirectoryContentsPath.exists function. I just pushed a commit that might help when newer versions of System.Directory are available: 4bb7d27.

Hmm, this directory listing function will still realize all the directories in memory at once, right? If we're optimizing that then I'd rather do it in a streaming fashion so the peak memory usage is low.

from hfsnotify.

thomasjm commented on June 26, 2024 1

Okay, I made a stab at an efficient implementation. There are a few TODOs before it's ready to ship but the tests are passing. Want to try it? I looked at your lib for reference on the dirstream stuff :)

https://github.com/haskell-fswatch/hfsnotify/blob/master/src/System/FSNotify/Linux.hs#L193

For the filtering stuff, I think we want to just add a new watchTree function that takes an extensible options object:

watchTree' :: WatchManager -> RecursiveListeningOptions -> FilePath -> ActionPredicate -> Action -> IO StopListening

data RecursiveListeningOptions = RecursiveListeningOptions {
  recursiveListeningOptionsDirectoryFilter :: FilePath -> Bool -- IO Bool ?
  }

defaultRecursiveListeningOptions = RecursiveListeningOptions {
  recursiveListeningOptionsDirectoryFilter = const True
  }

I haven't had time to do the filtering stuff yet (it will need to be added to each platform) but a PR would be welcome if you want it soon.

from hfsnotify.

thomasjm commented on June 26, 2024

Hmm, I don't see anything that looks O(N^2) but a couple things jump out at me in the recursive watch code

The recursive watch code first gathers all subdirectories into a list before adding any watches, which is probably not great for memory usage and may result in higher CPU usage due to garbage collection (?)
The code then loops over these subdirectories and adds a watch to each of them. Each of these operations requires waiting on a global MVar, and also creating one new MVar per watch. Either of these things might be causing high CPU.

We could improve 1) by gathering directories and adding watches in a streaming fashion, perhaps with pipes or conduit. We could perhaps improve 2) by restructuring our use of MVars, but it would have to be done carefully.

Any chance you could create a flamegraph with https://hackage.haskell.org/package/ghc-prof-flamegraph ?

from hfsnotify.

Janiczek commented on June 26, 2024

I will try to find time to generate the flamegraph and get back to you.

By the way you should be able to reproduce this behaviour solely based on the number of directories watched over -- tens of thousands of directories should provoke the behaviour.

By the way 2, my friend @turboMaCk has tried a continuation-passing-style approach to listing contents of a directory recursively that seems to be pretty performant even on higher file/dir counts:

https://github.com/turboMaCk/unix-recursive

EDIT: given the profile info in the original post, I think your point 2 won't be the culprit, since it didn't show up in the profile info. It all seems to be in directory manipulation, canonicalization etc.

from hfsnotify.

Janiczek commented on June 26, 2024

GitHub didn't let me upload a SVG, so here is a gist:
https://gist.github.com/Janiczek/f3bb8a1be8841b163aa233164bc67856
At the bottom there's a flamegraph attached.

from hfsnotify.

Janiczek commented on June 26, 2024

@thomasjm There's also a possibility (that I'm currently doing in my app) of filtering certain folder IDs and not recursing deeper into them. I don't know how that fits into the FSNotify API / responsibilities, but perhaps it could be a configuration parameter. FilePath -> Bool for whether you should care about a folder, eg. folder /= "node_modules" or not $ Glob.match ignoredDirPattern folder.

That could cut a large chunk of unneeded work in certain usecases (like mine).

from hfsnotify.

Janiczek commented on June 26, 2024

@thomasjm Is there a way I could try out the mentioned commit without a proper release? Does Stack support specifying the version with a commit hash perhaps?

from hfsnotify.

thomasjm commented on June 26, 2024

filtering certain folder IDs and not recursing deeper into them

Well, you could use the non-recursive watch function to set the watches on exactly the folders you want.

Does Stack support specifying the version with a commit hash perhaps?

Yes, like this in stack.yaml:

extra-deps:
- git: https://github.com/haskell-fswatch/hfsnotify.git
  commit: 4bb7d27e3217207537f544e6e6602bfe55faf930

Hopefully the work in progress I have going on in master branch doesn't cause problems, I was working on some breaking changes for the next release. If it does you may need to cherry-pick--you can point to a local checkout in stack.yaml like this:

packages:
- .
- /path/to/your/hfsnotify/checkout

from hfsnotify.

turboMaCk commented on June 26, 2024

sorry to jump in. I'm sort of in the middle of identifying different pieces which could be optimized to work better on large FS trees. I'm yet to identify exactly where the wins are but I'll let you know if I find something either by opening PR if there are some quick wins or by opening an issue where we can discuss more specific details.

from hfsnotify.

turboMaCk commented on June 26, 2024

I had a deeper look into this issue both by analyzing isolated things as well as directly analyzing the performance of code @Janiczek observed this behavior on. For the record it's this commit Janiczek/elmid@8cf9636 but I also needed to change few hard-coded paths to make it possible for me to use.

I'm able to reproduce higher than optimal CPU loads but as far as I can say code isn't O(n^2). It scales close to O(n) based on number of directories and files - that's being said I did not put much more effort into analyzing big O once I discovered the scaling is acceptable.

For this and similar as this cases the biggest problem is inability of watchTree to filter directories to which to recurse into. The primary cost is simply recursion into directories and files which are then ignored using predicate anyway.

I also have few points to @thomasjm's first reply:

Hmm, I don't see anything that looks O(N^2) but a couple things jump out at me in the recursive watch code
1. The recursive watch code first gathers all subdirectories into a list before adding any watches, which is probably not great for memory usage and may result in higher CPU usage due to garbage collection (?)
2. The code then loops over these subdirectories and adds a watch to each of them. Each of these operations requires waiting on a global MVar, and also creating one new MVar per watch. Either of these things might be causing high CPU.
We could improve 1) by gathering directories and adding watches in a streaming fashion, perhaps with pipes or conduit. We could perhaps improve 2) by restructuring our use of MVars, but it would have to be done carefully.

I spent quite some time trying to look for what is the optimal way to implement listing. See results of my measurements https://github.com/turboMaCk/unix-recursive#comparision-with-other-libraries
- TLDR is that something like pipes/conduit would actually make performance worse but they do make it easy to keep lazyio and memory under control
- memory profile of fsnotify is actually great considering the work it's doing
is likely one of the which slow fsnotify down but there will likely be a tradeoff with different implementations.

from hfsnotify.

thomasjm commented on June 26, 2024

Thanks for the information @turboMaCk, interesting stuff. A few thoughts/questions:

It looks like a large amount of the performance improvement in your lib explained by just the switch from FilePath to RawFilePath. Although it's a little hard to compare. Are you just recording overall CPU usage samples on your machine? It might help to smooth it out. Or to be really fancy, you could consider perf.
Any chance you could run this same test on the System.FSNotify.Path.findDirs function of hfsnotify? I'd be especially curious if the new commit I mentioned above makes a difference, since it's looking like functions such as doesFileExist are a lot more expensive than I would have expected.
Could you explain this statement a bit more?

pipes/conduit would actually make performance worse but they do make it easy to keep lazyio and memory under control

I'm confused because your results for dirstream, the one which uses pipes, seems to show the opposite (relatively good CPU usage but leaking memory)

from hfsnotify.

Janiczek commented on June 26, 2024

@thomasjm I know the answer to 1: it's the psrecord tool that only records one process ID (and optionally its children). So the graphs in @turboMaCk's repo don't show the whole system but just the measured apps.

from hfsnotify.

turboMaCk commented on June 26, 2024

Yes there is a huge difference between using packed ByteString an [Char]/String. Once you get close to optimal implementation the overhead of linked list is quite a killer in both CPU time and memory but also for these measurements I'm essentially creating a list of all paths on good portion of my $HOME. I'm not recording overall CPU just the process as @Janiczek said. Psrecord is like perf. Here is a script I used to measure
I'll try to measure two versions before and after the commit.

I'm confused because your results for dirstream, the one which uses pipes, seems to show the opposite (relatively good CPU usage but leaking memory)

Yes but it took the same program 70s which is 10x longer to finish than RawFilePath implementation and more that twice as long as FilePath one. Which is still quite a good result imo but it's also hardest implementation to use and it has most dependencies including deprecated system-filepath.

from hfsnotify.

turboMaCk commented on June 26, 2024

So I have a result for an implementation from master/HEAD. I've ran all other tests as well and basically the performance of other implementations is same as in link I posted so I won't spam this issue further.

So CPU time is on par with dirstream, however current implementation consumes near 10GB of ram which is consistent with forcing the list too early from my other measurements.

I also agree with all what you say in last post. I think in ideal case there would be 2 separate issues and PRs coming from this issue.

a) provide new recursive API can help folks avoid into this trap of doing more work than is obvious/needed.
b) optimize System.FSNotify.Path functions to produce results lazily

I can help you with tackling these if you want. I think especially b) should be doable without changes in API even of this internal module. a) will involve some API design

Makes sense, although without knowing more about why it's behaving that way I don't think we can rule out a performant pipes implementation being possible.

While I think using conduit or pipes is a good way to develop application I personally don't like the idea to use it for library internals unless really necessary. Libraries with less dependencies easier to maintain and are often preferred by developers because they don't make as many changes for them.

Reproducing my results:

All the code including fsnotify implementation can be found there: https://github.com/turboMaCk/unix-recursive/tree/test-fsnotify-functions (make sure you're using branch test-fsnotify-functions).

just be aware

this won't work on windows because of posix specific functionality in my implementation
bin flag needs to be passed during compilation to produce binaries. with stack this is --flag unix-recursive:bin
I have nix integration for stack enabled in stack.yaml. If you want to use stack but don't use nix use --no-nix flag
measure.sh [filepath] could be used to compare all implementations
- script might need some small adjustments in paths based on platform (unless you use nixos)
it's good idea to run it at least twice because kernels FS cache can have an effect on performance of this type of IO.

EDIT:

Sorry one more thing I forget to mention. I think the fact that recursive directory walking is not as optimal as it should be at the moment is really not that big deal because with current watchTree implemetnation, fsnotify attaches watcher to all these files so there simply always will be resources allocated for all inodes it walks through.

from hfsnotify.

georgefst commented on June 26, 2024

But it seems the thing that would make the biggest change would be to change to using RawFilePath or similar.

If this happens, could we please also export the underlying RawFilePath versions of everything? Then the existing API can just wrap all those functions in pack and unpack.

That's what I've always wanted from this library - I end up converting all the FilePaths to/from RawFilePath in my own code anyway.

EDIT: The time of an abstract OsPath is finally almost upon us, so probably best to wait for that.

from hfsnotify.

Ericson2314 commented on June 26, 2024

https://man7.org/linux/man-pages/man7/fanotify.7.html might be able to help with this?

from hfsnotify.

Likely O(n^2) behaviour for `watchTree` about hfsnotify HOT 17 OPEN

Comments (17)

Reproducing my results:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent