Coder Social home page Coder Social logo

Comments (13)

cpburnz avatar cpburnz commented on July 18, 2024 1

There will be a way to handle this scenario in the v0.10.0 release. The new class, GitIgnoreSpec, will need to be used. It will prioritize patterns that directly match the file over an ancestor directory. It can be used as:

# The "gitwildmatch" pattern factory is assumed by GitIgnoreSpec.
p = pathspec.GitIgnoreSpec.from_lines(['*.yaml', '!*.yaml/'])
p.match_file('dir/file.yaml')       # True
p.match_file('dir.yaml/file.yaml')  # True

from python-pathspec.

cpburnz avatar cpburnz commented on July 18, 2024

Hi @adrienverge,

Due to the current implementation it makes sense to me why dir.yaml/* is matching the pattern *.yaml. A dot in a folder name resembling a file extension is valid. There is not currently a way to tag a pattern to only match against files. This is the expected behavior. The most straight forward way to exclude directories with the .yaml extension is to add a negative pattern for it. E.g.,

p = pathspec.PathSpec.from_lines('gitwildmatch', [
    '*.yaml',
    '!*.yaml/'
])
p.match_file('file.yaml')
# True
p.match_file('dir.yaml/file.sql')
# False

Please let me know whether this works for you. I realize this may or may not work depending on your use case.

from python-pathspec.

adrienverge avatar adrienverge commented on July 18, 2024

Thanks for your answer @cpburnz! To complete it, I notice that Git + .gitignore behaves the same: if it contains a line *.yaml, it will also ignore files like dir.yaml/file.sql.

The ['*.yaml', '!*.yaml/'] pattern looks the right way to go 👍

However, while it works with Git and .gitignore, it doesn't seem to work with pathspec. More precisely, with the following tree, .gitignore successfully matches dir/file.yaml and dir.yaml/file.yaml, but pathspec doesn't match dir.yaml/file.yaml.

dir
├── index.txt     ← already tracked by git
├── file.sql
└── file.yaml
dir.yaml
├── index.txt     ← already tracked by git
├── file.sql
└── file.yaml

Said differently:

p = pathspec.PathSpec.from_lines('gitwildmatch', ['*.yaml', '!*.yaml/'])
p.match_file('dir/file.yaml')       # True
p.match_file('dir.yaml/file.yaml')  # False ← I would expect True (like with .gitignore)

from python-pathspec.

cpburnz avatar cpburnz commented on July 18, 2024

@adrienverge Oops, I overlooked that. This can still be accomplished be adding an addition pattern to reinclude .yaml files under a .yaml directory:

p = pathspec.PathSpec.from_lines('gitwildmatch', ['*.yaml', '!*.yaml/', '*.yaml/**/*.yaml'])
p.match_file('dir/file.yaml')      # True
p.match_file('dir.yaml/index.txt') # False
p.match_file('dir.yaml/file.yaml') # True

The *.yaml/**/*.yaml pattern is redundant for Git and does not affect its results. The cause for this difference is pathspec strictly implements Git's wildmatch patterns while Git does additional processing for .gitignore such as as tracking included/excluded directories.

from python-pathspec.

adrienverge avatar adrienverge commented on July 18, 2024

Thanks again @cpburnz!

Unfortunately this recursive pattern including/excluding seems to bring new problems:

  1. While the first two patterns match paths (e.g. dir/dir/dir/file.yaml), the third pattern doesn't work. It seems that I need to add **/ at the beginning, so it would be:

    '*.yaml',
    '!*.yaml/',
    '**/*.yaml/**/*.yaml',
  2. With these new patterns, now dir.yaml/dir.yaml/index.txt will be matched. We would then need to add two new patterns:

    ...
    '!**/*.yaml/**/*.yaml/',
    '**/*.yaml/**/*.yaml/**/*.yaml',

    ... but we can recurse for dir.yaml/dir.yaml/dir.yaml/index.txt, so we would need an infinite list of patterns.

  3. The problems is actually even worse, because we need to match different extensions/patterns. By default it's *.yaml, *.yml and .yamllint but users can set any pattern using configuration.

    Which means we would need to add (for the simple use case with just 3 extensions):

    '**/*.yaml/**/*.yaml',
    '!**/*.yaml/**/*.yaml/',
    '**/*.yml/**/*.yaml',
    '!**/*.yml/**/*.yaml/',
    '**/.yamllint/**/*.yaml',
    '!**/.yamllint/**/*.yaml/',
    '**/*.yaml/**/*.yml',
    '!**/*.yaml/**/*.yml/',
    '**/*.yml/**/*.yml',
    '!**/*.yml/**/*.yml/',
    '**/.yamllint/**/*.yml',
    '!**/.yamllint/**/*.yml/',
    '**/*.yaml/**/.yamllint',
    '!**/*.yaml/**/.yamllint/',
    '**/*.yml/**/.yamllint',
    '!**/*.yml/**/.yamllint/',
    '**/.yamllint/**/.yamllint',
    '!**/.yamllint/**/.yamllint/',
    ... + infinite recursive patterns from bullet 2.

The cause for this difference is pathspec strictly implements Git's wildmatch patterns while Git does additional processing for .gitignore such as as tracking included/excluded directories.

So I understand that pathspec implements the standard, while Git adds some user-friendly, non-standard processing.
Given the bullets above, and the fact that the need "match all paths whose filename end with *.yaml or *.yml" cannot be achieved without an infinite pattern list, do you think it would be doable to add an option to pathspec? Would it be easy? Can we help in any way?
If the answer is no, are you aware of another library that could achieve this goal?

(PS: I don't know how others use your useful library pathspec, but my guess is that 99% are looking for something that behaves exactly like .gitignore. If I'm wrong I would be curious to know the other use-cases!)

from python-pathspec.

cpburnz avatar cpburnz commented on July 18, 2024

@adrienverge That is indeed problematic without a good solution for current feature set.

So I understand that pathspec implements the standard, while Git adds some user-friendly, non-standard processing.

Precisely

Given the bullets above, and the fact that the need "match all paths whose filename end with *.yaml or *.yml" cannot be achieved without an infinite pattern list, do you think it would be doable to add an option to pathspec? Would it be easy? Can we help in any way?

It's certainly doable. I think the ideal implementation would be a class implementing the .gitignore specific behavior, possibly a subclass of PathSpec. I welcome a pull request implementing it.

If the answer is no, are you aware of another library that could achieve this goal?

The answer isn't no, but I am aware of another library that tries to specifically implement .gitignore behavior: igittigitt. I have no experience with it myself.

(PS: I don't know how others use your useful library pathspec, but my guess is that 99% are looking for something that behaves exactly like .gitignore. If I'm wrong I would be curious to know the other use-cases!)

That seems to be the case. My original use case was I wanted a library with glob-like file matching similar to git or rsync.

from python-pathspec.

adrienverge avatar adrienverge commented on July 18, 2024

I welcome a pull request implementing it.

Thanks again for following this up @cpburnz 👍

@liopic @dafyddj @jsok you are the ones affected by this (via adrienverge/yamllint#279 and adrienverge/yamllint#334), would you like to contribute to python-pathspec and implement a .gitignore-like subclass of PathSpec?

If not, I can try to find time myself but I don't know precisely when it could be. I already went forward and found this relevant code for do_match_pathspec() in Git source: https://github.com/git/git/blob/d4a3924/dir.c#L432-L501
@cpburnz if you have any pointers or documentation on this "extra processing" that Git does, it would be more than welcome :)

from python-pathspec.

dafyddj avatar dafyddj commented on July 18, 2024

@adrienverge Interesting to read, but I am not a developer so I don't think I can help with this - but good luck!

from python-pathspec.

excitoon avatar excitoon commented on July 18, 2024

Guys this is not a bug. Git works exactly as the spec says:

If there is a separator at the end of the pattern then the pattern will only match directories, otherwise the pattern can match both files and directories.

https://git-scm.com/docs/gitignore

from python-pathspec.

cpburnz avatar cpburnz commented on July 18, 2024

@excitoon I think it's an edge case not adequately described in the gitignore specification. Git behaves exactly as described in this issue, and at the moment pathspec behaves differently. I'm working fixing the discrepancy.

from python-pathspec.

excitoon avatar excitoon commented on July 18, 2024

@cpburnz You can check how I solved it in my module.

First, making a group for extra path parts:
excitoon/gitignorefile@6763d47#diff-21e52cb6c7e0acafac258d280cac806bcf75715e4207b85d451e0c9e88d9f2b5R305-R306

Second, analyzing if anything matched. If it did not, we shall query filesystem to check if it is a directory:
excitoon/gitignorefile@6763d47#diff-21e52cb6c7e0acafac258d280cac806bcf75715e4207b85d451e0c9e88d9f2b5R234

from python-pathspec.

excitoon avatar excitoon commented on July 18, 2024

One more example:

core
!core/

from python-pathspec.

adrienverge avatar adrienverge commented on July 18, 2024

Thanks @cpburnz!

from python-pathspec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.