Comments (10)
It would be great if there was a way to combine multiple patterns from different lines into larger regexes automatically.
👀 It is possible, from my experimentation:
- for multiple normal lines, I can just or them together:
pattern1|pattern2
- for negation lines, I can do this
(?!negation_regex)(?:previous_regex)
.
then you end up with one long pattern like (?!negation5)(?:(?!negation3)(?:pattern1|pattern2)|pattern4)
But I have a completely different implementation so idk how hard that would be for this project.
I actually have 2 patterns: one that's used if the path is a directory, and one that's used if the path doesn't exist or is a file. That lets me flatten all the patterns into one. But since checking if it's a dir is comparatively slow, I also have a setting to not check and assume everything passed in is a file such that foo/
matches foo/bar
but not foo
even when foo
is a folder.
I'm still working on fixing #74
from python-pathspec.
For example, It would take 20μs for each file. And 2 seconds for 100k file. And if we use big regex and use if
expression to skip the normalization in the UNIX system. It could be 100ms (maybe several hundred for Windows users). This could give great help to user experience in the interactive tools relied on path specification.
from python-pathspec.
Can you provide an example of how you're specifically performing the matches? About how long is a long time? Is it on the order of minutes, hours, or days? This will help me look into the performance issue.
from python-pathspec.
I checked pathspec
against gitignorefile
on this branch https://github.com/excitoon/3/tree/pathspec . On big project (16188 directories, 204718 files) it is still faster:
real 0m43.853s
vs
real 0m25.885s
I'll check if I can fix it.
from python-pathspec.
I made it to:
real 0m28.939s
so far. Thing is, gitignorefile
's results are more precise, and if I could afford wrong results, it would be much more fast.
from python-pathspec.
I got slightly better RE for a start of pattern: (?:^|.+/)
instead of ^(?:.+/)?
. @cpburnz check that out
from python-pathspec.
Is it worth adding an actual benchmark with e.g. pytest-benchmark or asv?
from python-pathspec.
from python-pathspec.
It would be great if there was a way to combine multiple patterns from different lines into larger regexes automatically.
from python-pathspec.
for multiple normal lines, I can just or them together: pattern1|pattern2
for negation lines, I can do this (?!negation_regex)(?:previous_regex).
I only used method 1 in another project and get a significant performance improvement.
Method 2 is something I didn't think of. In my case, I split the pattern into several groups, only the same type of pattern can be joined together.
from python-pathspec.
Related Issues (20)
- Symlink pathspec_meta.py breaks Windows HOT 1
- test_util.py uses os.symlink which can fail on Windows HOT 1
- Backslashes at start of pattern not handled correctly HOT 1
- `!` doesn't exclude files in directories if the pattern doesn't have a trailing slash HOT 1
- Dist failure for Fedora, CentOS, EPEL HOT 11
- Since version 0.10.0 pure wildcard does not work in some cases HOT 4
- The pattern_to_regex method does not seem to work correctly on windows. HOT 4
- IndexError with my .gitignore file when trying to build a Python package HOT 4
- Checking directories via match_file() does not work on Path objects HOT 5
- Package not marked as `py.typed` HOT 1
- Exports are considered private HOT 1
- `'Self'` string literal type is `Unknown` in pyright HOT 1
- Please consider switching the build-system to flit_core to ease setuptools bootstrap HOT 5
- Include directory should override exclude file HOT 3
- On bracket expression negation HOT 2
- match_files with negated path spec HOT 5
- `GitIgnoreSpec` behaviors differ from git HOT 2
- PathSpec.match_file() returns None since 0.12.0 HOT 3
- Exclusions not working HOT 2
- Leading & trailing whitespace HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-pathspec.