Coder Social home page Coder Social logo

conduit-find's Introduction

conduit-find

Build Status

conduit-find is essentially a souped version of GNU find for Haskell, using a DSL to provide both ease of us, and extensive flexbility.

In its simplest form, let's compare some uses of find to find-conduit. Bear in mind that the result of the find function is a conduit, so you're expected to either sink it to a list, or operate on the file paths as they are yielded.

Basic comparison with GNU find

A typical find command:

find src -name '*.hs' -type f -print

Would in find-conduit be:

find "src" (glob "*.hs" <> regular) $$ mapM_C (liftIO . print)

The glob predicate matches the file basename against the globbing pattern, while the regular predicate matches plain files.

A more complicated example:

find . -size +100M -perm 644 -mtime 1

Now in find-conduit:

let megs = 1024 * 1024
    days = 86400
now <- liftIO getCurrentTime
find "." ( fileSize (> 100*megs)
        <> hasMode 0o644
        <> lastModified (> addUTCTime now (-(1*days)))
         )

Appending predicates like this expressing an "and" relationship. Use <|> to express "or". You can also negate any predicate:

find "." (not_ (hasMode 0o644))

By default, predicates, whether matching or not, will allow recursion into directories. In order to express that matching predicate should disallow recursion, use prune:

find "." (prune (depth (> 2)))

This is the same as using -maxdepth 2 in find.

find "." (prune (filename_ (== "dist")))

This is the same as:

find . \( -name dist -prune \) -o -print

Performance

find-conduit strives to make file-finding a well performing operation. To this end, a composed Predicate will only call stat once per entry being considered; and if you prune a directory, it is not traversed at all.

By default, find calls stat for every file before it applies the predicate, in order to ensure that only one such call is needed. Sometimes, however, you know just from the FilePath that you don't want to consider a certain file, or you want to prune a directory tree.

To support these types of optimized queries, a variant of find is provided called findWithPreFilter. This takes two predicates: one that is applied to only the FilePath, before stat (or lstat) is called; and one that is applied to the full file information after the stat.

Final notes

Predicates form a Category and an Arrow, so you can use Arrow-style composition rather than Monoids if you wish. They also form an Applicative, a Monad and a MonadPlus.

In the Monad, the value bound over is whatever the predicate chooses to return (most Predicates return the same FilePath they examined, however, making the Monad less value). Here's an example Monad:

start <- liftIO getCurrentTime
find "." $ do
    -- The Predicate Monad is a short-circuiting monad, meaning we stop as
    -- soon as it can be determined that the user is not interested in a
    -- given file.  To access the current file, simply bind the result
    -- value from any Predicate.  To change the file being matched against,
    -- for whatever reason, use 'consider'.
    glob "*.hs"

    -- If the find takes longer than 5 minutes, abort.  We could have
    -- used 'timeout', but this is for illustration.
    end <- liftIO getCurrentTime
    if diffUTCTIme end start > 300
        then ignoreAll
        else matchAll                -- matchAll is "id" in this Category

conduit-find's People

Contributors

erikd avatar junjihashimoto avatar jwiegley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

conduit-find's Issues

Cannot build on Ubuntu HP2013.2.0.0

$ cabal --version 
cabal-install version 1.20.0.1
using version 1.20.0.0 of the Cabal library 
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3
$ cabal install find-conduit
Resolving dependencies...
Configuring find-conduit-0.4.1...
Building find-conduit-0.4.1...
Preprocessing library find-conduit-0.4.1...
[1 of 2] Compiling Data.Cond        ( Data/Cond.hs, dist/dist-sandbox-a6a7b1ac/build/Data/Cond.o )

Data/Cond.hs:247:5:
    `mask' is not a (visible) method of class `MonadCatch'

Data/Cond.hs:249:5:
    `uninterruptibleMask' is not a (visible) method of class `MonadCatch'
Failed to install find-conduit-0.4.1
cabal: Error: some packages failed to install:
find-conduit-0.4.1 failed during the building phase. The exception was:
ExitFailure 1

I have the following:

$ ghc-pkg list | grep transformers
    transformers-0.3.0.0
$ ghc-pkg list | grep mtl
    MonadCatchIO-mtl-0.3.0.5
    mtl-2.1.2

Cannot build on Windows

I am hoping to implement a find functionality which ignores globs specified in a .gitignore-like file. Unfortunately, on Windows I am getting an error since find-conduit depends on unix-2.7.0.1, which I cannot build.

cabal.exe: Package unix-2.7.0.1 can't be built on this system.
Failed to install unix-2.7.0.1
cabal.exe: Error: some packages failed to install:
find-conduit-0.4.1 depends on unix-2.7.0.1 which failed to install.
unix-2.7.0.1 failed during the building phase. The exception was:
ExitFailure 1

Is there a way to make the package cross-platform the same way as fsnotify?

EDIT: I see that only the executable depends on the unix package. I changed that to unix-compat but unfortunately, I am still getting the following (I'm on GHC 7.6.3 and HaskellPlatform 2013.2.0.0):

cabal.exe: Error: some packages failed to install:
chunked-data-0.1.0.1 depends on vector-instances-0.0.2.1 which failed to
install.
conduit-combinators-0.2.5.2 depends on vector-instances-0.0.2.1 which failed
to install.
find-conduit-0.4.1 depends on vector-instances-0.0.2.1 which failed to
install.
mono-traversable-0.6.0 depends on vector-instances-0.0.2.1 which failed to
install.
vector-instances-0.0.2.1 failed during the building phase. The exception was:
ExitFailure 1

Deprecated?

Hi John,

I notice that this has been marked as deprecated in Hackage?

https://hackage.haskell.org/package/find-conduit

Do you mind if I take this over and maintain it?

Failed to install find-conduit-0.4.3 (Manjaro)

...
Downloading find-conduit-0.4.3...
Configuring find-conduit-0.4.3...
Building find-conduit-0.4.3...
Failed to install find-conduit-0.4.3
Build log ( /home/xged/.cabal/logs/find-conduit-0.4.3.log ):
Configuring find-conduit-0.4.3...
Building find-conduit-0.4.3...
Preprocessing library find-conduit-0.4.3...
[1 of 2] Compiling Data.Cond        ( Data/Cond.hs, dist/build/Data/Cond.o )
[2 of 2] Compiling Data.Conduit.Find ( Data/Conduit/Find.hs, dist/build/Data/Conduit/Find.o )
In-place registering find-conduit-0.4.3...
Preprocessing executable 'find-hs' for find-conduit-0.4.3...
[1 of 1] Compiling Main             ( test/find-hs.hs, dist/build/find-hs/find-hs-tmp/Main.o )

test/find-hs.hs:19:44:
    Couldn't match type ‘Filesystem.Path.CurrentOS.FilePath’
                with ‘[Char]’
    Expected type: Prelude.FilePath
    Actual type: Filesystem.Path.CurrentOS.FilePath
    In the second argument of ‘sourceDirectoryDeep’, namely
    ‘(decodeString dir)’
    In the first argument of ‘(=$)’, namely
    ‘sourceDirectoryDeep False (decodeString dir)’

test/find-hs.hs:20:56:
    Couldn't match type ‘[Char]’
                with ‘Filesystem.Path.CurrentOS.FilePath’
    Expected type: Prelude.FilePath -> [Char]
    Actual type: Filesystem.Path.CurrentOS.FilePath -> String
    In the second argument of ‘(.)’, namely ‘encodeString’
    In the first argument of ‘filterC’, namely
    ‘((".hs" `isSuffixOf`) . encodeString)’

test/find-hs.hs:21:52:
    Couldn't match type ‘[Char]’
                with ‘Filesystem.Path.CurrentOS.FilePath’
    Expected type: Prelude.FilePath -> String
    Actual type: Filesystem.Path.CurrentOS.FilePath -> String
    In the second argument of ‘(.)’, namely ‘encodeString’
    In the second argument of ‘(.)’, namely ‘putStrLn . encodeString’
cabal: Error: some packages failed to install:
find-conduit-0.4.3 failed during the building phase. The exception was:
ExitFailure 1

Unbuildable now

/tmp/stack-90745e800a2d2fef/conduit-find-0.1.0.3/Data/Cond.hs:41:1: error:
        Could not find module ‘Control.Monad.Trans.Either’
        Perhaps you meant
          Control.Monad.Trans.Writer (from transformers-0.5.6.2)
          Control.Monad.Trans.Error (from transformers-0.5.6.2)
          Control.Monad.Trans.Reader (from transformers-0.5.6.2)
        Use -v to see a list of the files searched for.
       |
    41 | import Control.Monad.Trans.Either
       | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Out of date performance notes

Hi,

The docs refer to findWithPreFilter as a way to avoid unnecessary stat calls. This function is no longer there, but I could not figure out whether the new implementation (just using prune to filter by name) had similar performance characteristics. Does it?

Thanks John and Erik for writing/maintaining this library!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.