Comments (16)
Just in case anyone comes back to this ticket: This all happened before parallel search was implemented (#41). fd
has become much faster since then and is typically also faster than find -iname
.
from fd.
@sharkdp Well, I came to the conclusion that it actually performs pretty darn well.
If you disable coloring + .gitignore
and run a very basic search(fd vs find -name '*.cpp'), fd
is a few tens of milliseconds slower but in the rest of cases, it is faster(again, without coloring, which would be the fair way to compare).
So, it is fast, but not with coloring(actually it is fast with coloring too, you don't wait a lot, but comparing the milliseconds/seconds with find
...) and for a very basic search where you can use -name
.
from fd.
The -iregex
flag is now mentioned in the README. I'm going to close this and open a new ticket for reproducible benchmarks.
from fd.
I think @lilianmoraru and @sharkdp need to focus on finding a common set of files that they can each benchmark and verify. There are too many variables at play to immediately blame the regex engine. @lilianmoraru Even in your own benchmark find
with -iregex
is faster than fd
, so clearly, there is more to the story.
from fd.
Unfortunately, neither your comment nor the README provide a way to easily run and confirm the benchmark for yourself, so it's pretty hard to make any kind of progress.
I'd encourage you to experiment with disabling gitignore
support, which I believe is the -I
flag.
from fd.
Writing to stdout
seems slow:
$ time fd '.*\.cpp$' | wc -l
10863
real 0m0.462s
user 0m0.268s
sys 0m0.180s
--
$ time find -iregex '.*\.cpp$' | wc -l
10863
real 0m0.335s
user 0m0.136s
sys 0m0.192s
from fd.
Also, the README's benchmark is clearly running with different flags than what you've provided. :-)
from fd.
I agree but it implies that the search in general is faster(also mentions that it is fair).
I think the arguments are valid for regex search only, which would be ok if mentioned.
from fd.
@lilianmoraru Did you experiment with the -I
flag? What did you discover?
from fd.
No difference(This source code does not have .gitignore):
$ time fd -I '.*\.cpp$' | wc -l
10863
real 0m0.477s
user 0m0.112s
sys 0m0.384s
from fd.
If I use "-n" it has almost the same performance characteristics as find -iregex
.
from fd.
I think that the README should specifically mention that the regex search is faster(that's because the regex is slow in find).
I think you are right, it seems like the -iregex
search in find
is at least part of the reason why find was slower in this particular benchmark that I did in my home folder.
The actual search is slower(it seems to imply that the find in general is faster).
I think this will really depend on the specific situtation, as @BurntSushi mentioned:
I think @lilianmoraru and @sharkdp need to focus on finding a common set of files that they can each benchmark and verify.
Absolutely. I honestly did not expect this to become this popular that fast, so the current benchmark was really just a first shot in order for me to get a feeling about the performance.
Writing to stdout seems slow
Yes, please pipe the output to /dev/null
like in the README or at least turn on -n/--no-color
for fd
- otherwise fd
might be slowed down by the terminal rendering.
[..] it implies that the search in general is faster(also mentions that it is fair).
The README says: "The given options for fd are needed for a fair comparison". The options are --hidden
(search through hidden folders), --no-ignore
(do not respect ignore files) and --full-path
(search the whole path, not just file- and directory names). I turned these options on in order for a 'fair' comparison because find
does all these things by default (full path search only for -iregex
). Without these options, fd
is much faster:
> time fd '.*[0-9]\.jpg$' > /dev/null
fd '.*[0-9]\.jpg$' > /dev/null 0,33s user 0,22s system 99% cpu 0,555 total
> time find -iregex '.*[0-9]\.jpg$' > /dev/null
find -iregex '.*[0-9]\.jpg$' > /dev/null 4,38s user 0,90s system 99% cpu 5,298 total
Coming back to your original point, you are right in that find
seems to be much faster when using -iname
instead of -iregex
:
> time find -iname '*[0-9].jpg' > /dev/null
find -iname '*[0-9].jpg' > /dev/null 1,78s user 0,93s system 99% cpu 2,715 total
I think the arguments are valid for regex search only, which would be ok if mentioned.
Agreed.
I suggest the following:
- Specifically mention the
-iregex
option in the README - Work on (several) reproducible benchmarks. Also, do statistics (I've started using bench)
- Keep improving fd's performance 😃
As a first version of a reproducible benchmark (suggesting that find -iname
is slightly faster than fd
), clone https://github.com/rust-lang/rust
and run:
> bench "fd -HI '\.py$'" "find -iname '*.py'" "find -iregex '.*\.py$'"
benchmarking bench/fd -HI '\.py$'
time 22.98 ms (22.63 ms .. 23.21 ms)
0.999 R² (0.997 R² .. 1.000 R²)
mean 23.40 ms (23.16 ms .. 23.76 ms)
std dev 655.0 μs (496.3 μs .. 867.7 μs)
benchmarking bench/find -iname '*.py'
time 17.78 ms (17.39 ms .. 18.12 ms)
0.996 R² (0.991 R² .. 0.999 R²)
mean 18.10 ms (17.87 ms .. 18.46 ms)
std dev 730.5 μs (460.3 μs .. 1.151 ms)
variance introduced by outliers: 12% (moderately inflated)
benchmarking bench/find -iregex '.*\.py$'
time 29.63 ms (29.19 ms .. 30.04 ms)
0.999 R² (0.999 R² .. 1.000 R²)
mean 29.52 ms (29.33 ms .. 29.77 ms)
std dev 448.1 μs (315.7 μs .. 678.3 μs)
from fd.
Btw, if you want to bench the Rust code(and use the nightly bench
- for example, rayon
puts the benches in a separate workspace project), you also have this option: https://github.com/BurntSushi/cargo-benchcmp.
Side-note:
I like how you can do this:
For a file StuffAndStuff.txt
, you can just write fd and
and it will find it, while doing something like find -iregex "And"
of course doesn't work...
from fd.
Also, consider adding a larger repository to your benchmark. :-) A couple dozen milliseconds is frighteningly fast---probably in "process overhead" territory. (Of course, that is also important to benchmark!)
from fd.
It seems that it is enough to make the regex a bit more complicated and find
turns slower:
time fd --hidden --no-ignore --full-path -n hello | wc -l
79945
fd --hidden --no-ignore --full-path -n hello 1,32s user 0,65s system 109% cpu 1,810 total
wc -l 0,16s user 0,05s system 11% cpu 1,810 total
--------
time find -iregex ".*[Hh][Ee][Ll][Ll][Oo].*" | wc -l
79945
find -iregex ".*[Hh][Ee][Ll][Ll][Oo].*" 2,37s user 0,53s system 108% cpu 2,664 total
wc -l 0,01s user 0,00s system 0% cpu 2,664 total
from fd.
Also, consider adding a larger repository to your benchmark. :-) A couple dozen milliseconds is frighteningly fast---probably in "process overhead" territory. (Of course, that is also important to benchmark!)
Yes, thanks. It looks like those results are similar for larger folders, though (fd
being 30%-50% slower than find -iname
) -- for this particular search pattern.
from fd.
Related Issues (20)
- Filter files based on command output HOT 1
- [BUG] Incorrect application of `.gitignore` rules when using `fd` from a nested directory HOT 3
- [BUG] fd --glob seems wrong HOT 3
- Add clippy check to github actions CI HOT 2
- [BUG] Wrong result when --full-path and .. HOT 3
- `--all` argument HOT 2
- Ignore top level .gitignore HOT 3
- Chinese version of fd project HOT 2
- The file name containing "-- " could not be found HOT 3
- fd? fdfind? fdclone? HOT 1
- Ignore cache directories by default HOT 2
- [BUG] Redirected stdout (pipe or file) on windows has wrong encoding HOT 2
- [BUG] fd -e o not works. HOT 2
- [BUG] search strings containing umlaut fails to find any results HOT 11
- find a file upwards HOT 2
- conda-forge package HOT 6
- FD version 10 stopped working on Windows 7 HOT 10
- Binary for Armv8 (RPI 5) HOT 2
- [BUG] Can't Match /etc/passwd HOT 2
- Ability to disable one or more `.gitignore` files without having to use `--no-ignore-vcs` HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fd.