Coder Social home page Coder Social logo

Performance improvement? about python-pathspec HOT 10 OPEN

cpburnz avatar cpburnz commented on July 18, 2024
Performance improvement?

from python-pathspec.

Comments (10)

bkarstens avatar bkarstens commented on July 18, 2024 2

It would be great if there was a way to combine multiple patterns from different lines into larger regexes automatically.

👀 It is possible, from my experimentation:

  • for multiple normal lines, I can just or them together: pattern1|pattern2
  • for negation lines, I can do this (?!negation_regex)(?:previous_regex).

then you end up with one long pattern like (?!negation5)(?:(?!negation3)(?:pattern1|pattern2)|pattern4)

But I have a completely different implementation so idk how hard that would be for this project.

I actually have 2 patterns: one that's used if the path is a directory, and one that's used if the path doesn't exist or is a file. That lets me flatten all the patterns into one. But since checking if it's a dir is comparatively slow, I also have a setting to not check and assume everything passed in is a file such that foo/ matches foo/bar but not foo even when foo is a folder.

I'm still working on fixing #74

from python-pathspec.

karajan1001 avatar karajan1001 commented on July 18, 2024 1

image
image
image

For example, It would take 20μs for each file. And 2 seconds for 100k file. And if we use big regex and use if expression to skip the normalization in the UNIX system. It could be 100ms (maybe several hundred for Windows users). This could give great help to user experience in the interactive tools relied on path specification.

from python-pathspec.

cpburnz avatar cpburnz commented on July 18, 2024

Can you provide an example of how you're specifically performing the matches? About how long is a long time? Is it on the order of minutes, hours, or days? This will help me look into the performance issue.

from python-pathspec.

excitoon avatar excitoon commented on July 18, 2024

I checked pathspec against gitignorefile on this branch https://github.com/excitoon/3/tree/pathspec . On big project (16188 directories, 204718 files) it is still faster:

real	0m43.853s

vs

real	0m25.885s

I'll check if I can fix it.

from python-pathspec.

excitoon avatar excitoon commented on July 18, 2024

I made it to:

real	0m28.939s

so far. Thing is, gitignorefile's results are more precise, and if I could afford wrong results, it would be much more fast.

from python-pathspec.

excitoon avatar excitoon commented on July 18, 2024

I got slightly better RE for a start of pattern: (?:^|.+/) instead of ^(?:.+/)?. @cpburnz check that out

from python-pathspec.

bollwyvl avatar bollwyvl commented on July 18, 2024

Is it worth adding an actual benchmark with e.g. pytest-benchmark or asv?

from python-pathspec.

excitoon avatar excitoon commented on July 18, 2024

from python-pathspec.

Dobatymo avatar Dobatymo commented on July 18, 2024

It would be great if there was a way to combine multiple patterns from different lines into larger regexes automatically.

from python-pathspec.

karajan1001 avatar karajan1001 commented on July 18, 2024

for multiple normal lines, I can just or them together: pattern1|pattern2
for negation lines, I can do this (?!negation_regex)(?:previous_regex).

I only used method 1 in another project and get a significant performance improvement.

Method 2 is something I didn't think of. In my case, I split the pattern into several groups, only the same type of pattern can be joined together.

from python-pathspec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.