Comments (13)
There will be a way to handle this scenario in the v0.10.0 release. The new class, GitIgnoreSpec
, will need to be used. It will prioritize patterns that directly match the file over an ancestor directory. It can be used as:
# The "gitwildmatch" pattern factory is assumed by GitIgnoreSpec.
p = pathspec.GitIgnoreSpec.from_lines(['*.yaml', '!*.yaml/'])
p.match_file('dir/file.yaml') # True
p.match_file('dir.yaml/file.yaml') # True
from python-pathspec.
Hi @adrienverge,
Due to the current implementation it makes sense to me why dir.yaml/*
is matching the pattern *.yaml
. A dot in a folder name resembling a file extension is valid. There is not currently a way to tag a pattern to only match against files. This is the expected behavior. The most straight forward way to exclude directories with the .yaml
extension is to add a negative pattern for it. E.g.,
p = pathspec.PathSpec.from_lines('gitwildmatch', [
'*.yaml',
'!*.yaml/'
])
p.match_file('file.yaml')
# True
p.match_file('dir.yaml/file.sql')
# False
Please let me know whether this works for you. I realize this may or may not work depending on your use case.
from python-pathspec.
Thanks for your answer @cpburnz! To complete it, I notice that Git + .gitignore
behaves the same: if it contains a line *.yaml
, it will also ignore files like dir.yaml/file.sql
.
The ['*.yaml', '!*.yaml/']
pattern looks the right way to go 👍
However, while it works with Git and .gitignore
, it doesn't seem to work with pathspec. More precisely, with the following tree, .gitignore
successfully matches dir/file.yaml
and dir.yaml/file.yaml
, but pathspec doesn't match dir.yaml/file.yaml
.
dir
├── index.txt ← already tracked by git
├── file.sql
└── file.yaml
dir.yaml
├── index.txt ← already tracked by git
├── file.sql
└── file.yaml
Said differently:
p = pathspec.PathSpec.from_lines('gitwildmatch', ['*.yaml', '!*.yaml/'])
p.match_file('dir/file.yaml') # True
p.match_file('dir.yaml/file.yaml') # False ← I would expect True (like with .gitignore)
from python-pathspec.
@adrienverge Oops, I overlooked that. This can still be accomplished be adding an addition pattern to reinclude .yaml
files under a .yaml
directory:
p = pathspec.PathSpec.from_lines('gitwildmatch', ['*.yaml', '!*.yaml/', '*.yaml/**/*.yaml'])
p.match_file('dir/file.yaml') # True
p.match_file('dir.yaml/index.txt') # False
p.match_file('dir.yaml/file.yaml') # True
The *.yaml/**/*.yaml
pattern is redundant for Git and does not affect its results. The cause for this difference is pathspec strictly implements Git's wildmatch patterns while Git does additional processing for .gitignore
such as as tracking included/excluded directories.
from python-pathspec.
Thanks again @cpburnz!
Unfortunately this recursive pattern including/excluding seems to bring new problems:
-
While the first two patterns match paths (e.g.
dir/dir/dir/file.yaml
), the third pattern doesn't work. It seems that I need to add**/
at the beginning, so it would be:'*.yaml', '!*.yaml/', '**/*.yaml/**/*.yaml',
-
With these new patterns, now
dir.yaml/dir.yaml/index.txt
will be matched. We would then need to add two new patterns:... '!**/*.yaml/**/*.yaml/', '**/*.yaml/**/*.yaml/**/*.yaml',
... but we can recurse for
dir.yaml/dir.yaml/dir.yaml/index.txt
, so we would need an infinite list of patterns. -
The problems is actually even worse, because we need to match different extensions/patterns. By default it's
*.yaml
,*.yml
and.yamllint
but users can set any pattern using configuration.Which means we would need to add (for the simple use case with just 3 extensions):
'**/*.yaml/**/*.yaml', '!**/*.yaml/**/*.yaml/', '**/*.yml/**/*.yaml', '!**/*.yml/**/*.yaml/', '**/.yamllint/**/*.yaml', '!**/.yamllint/**/*.yaml/', '**/*.yaml/**/*.yml', '!**/*.yaml/**/*.yml/', '**/*.yml/**/*.yml', '!**/*.yml/**/*.yml/', '**/.yamllint/**/*.yml', '!**/.yamllint/**/*.yml/', '**/*.yaml/**/.yamllint', '!**/*.yaml/**/.yamllint/', '**/*.yml/**/.yamllint', '!**/*.yml/**/.yamllint/', '**/.yamllint/**/.yamllint', '!**/.yamllint/**/.yamllint/', ... + infinite recursive patterns from bullet 2.
The cause for this difference is pathspec strictly implements Git's wildmatch patterns while Git does additional processing for
.gitignore
such as as tracking included/excluded directories.
So I understand that pathspec implements the standard, while Git adds some user-friendly, non-standard processing.
Given the bullets above, and the fact that the need "match all paths whose filename end with *.yaml
or *.yml
" cannot be achieved without an infinite pattern list, do you think it would be doable to add an option to pathspec? Would it be easy? Can we help in any way?
If the answer is no, are you aware of another library that could achieve this goal?
(PS: I don't know how others use your useful library pathspec, but my guess is that 99% are looking for something that behaves exactly like .gitignore
. If I'm wrong I would be curious to know the other use-cases!)
from python-pathspec.
@adrienverge That is indeed problematic without a good solution for current feature set.
So I understand that pathspec implements the standard, while Git adds some user-friendly, non-standard processing.
Precisely
Given the bullets above, and the fact that the need "match all paths whose filename end with *.yaml or *.yml" cannot be achieved without an infinite pattern list, do you think it would be doable to add an option to pathspec? Would it be easy? Can we help in any way?
It's certainly doable. I think the ideal implementation would be a class implementing the .gitignore
specific behavior, possibly a subclass of PathSpec
. I welcome a pull request implementing it.
If the answer is no, are you aware of another library that could achieve this goal?
The answer isn't no, but I am aware of another library that tries to specifically implement .gitignore
behavior: igittigitt. I have no experience with it myself.
(PS: I don't know how others use your useful library pathspec, but my guess is that 99% are looking for something that behaves exactly like .gitignore. If I'm wrong I would be curious to know the other use-cases!)
That seems to be the case. My original use case was I wanted a library with glob-like file matching similar to git or rsync.
from python-pathspec.
I welcome a pull request implementing it.
Thanks again for following this up @cpburnz 👍
@liopic @dafyddj @jsok you are the ones affected by this (via adrienverge/yamllint#279 and adrienverge/yamllint#334), would you like to contribute to python-pathspec and implement a .gitignore
-like subclass of PathSpec
?
If not, I can try to find time myself but I don't know precisely when it could be. I already went forward and found this relevant code for do_match_pathspec()
in Git source: https://github.com/git/git/blob/d4a3924/dir.c#L432-L501
@cpburnz if you have any pointers or documentation on this "extra processing" that Git does, it would be more than welcome :)
from python-pathspec.
@adrienverge Interesting to read, but I am not a developer so I don't think I can help with this - but good luck!
from python-pathspec.
Guys this is not a bug. Git works exactly as the spec says:
If there is a separator at the end of the pattern then the pattern will only match directories, otherwise the pattern can match both files and directories.
https://git-scm.com/docs/gitignore
from python-pathspec.
@excitoon I think it's an edge case not adequately described in the gitignore specification. Git behaves exactly as described in this issue, and at the moment pathspec behaves differently. I'm working fixing the discrepancy.
from python-pathspec.
@cpburnz You can check how I solved it in my module.
First, making a group for extra path parts:
excitoon/gitignorefile@6763d47#diff-21e52cb6c7e0acafac258d280cac806bcf75715e4207b85d451e0c9e88d9f2b5R305-R306
Second, analyzing if anything matched. If it did not, we shall query filesystem to check if it is a directory:
excitoon/gitignorefile@6763d47#diff-21e52cb6c7e0acafac258d280cac806bcf75715e4207b85d451e0c9e88d9f2b5R234
from python-pathspec.
One more example:
core
!core/
from python-pathspec.
Thanks @cpburnz!
from python-pathspec.
Related Issues (20)
- `match_files()` is not a pure generator function, and it impacts `tree_*()` gravely HOT 1
- Symlink pathspec_meta.py breaks Windows HOT 1
- test_util.py uses os.symlink which can fail on Windows HOT 1
- Backslashes at start of pattern not handled correctly HOT 1
- `!` doesn't exclude files in directories if the pattern doesn't have a trailing slash HOT 1
- Dist failure for Fedora, CentOS, EPEL HOT 11
- Since version 0.10.0 pure wildcard does not work in some cases HOT 4
- The pattern_to_regex method does not seem to work correctly on windows. HOT 4
- IndexError with my .gitignore file when trying to build a Python package HOT 4
- Checking directories via match_file() does not work on Path objects HOT 5
- Package not marked as `py.typed` HOT 1
- Exports are considered private HOT 1
- `'Self'` string literal type is `Unknown` in pyright HOT 1
- Please consider switching the build-system to flit_core to ease setuptools bootstrap HOT 5
- Include directory should override exclude file HOT 3
- On bracket expression negation HOT 2
- match_files with negated path spec HOT 5
- `GitIgnoreSpec` behaviors differ from git HOT 2
- PathSpec.match_file() returns None since 0.12.0 HOT 3
- Exclusions not working HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-pathspec.