Coder Social home page Coder Social logo

Comments (7)

technoweenie avatar technoweenie commented on May 18, 2024

Thanks for bringing this up. I'm really interested in changes that we should make ASAP to make things feel like Git as much as possible. Other spec or tooling issues will become more important as we get closer to the launch. It's great to start thinking about this now though.

  • Can git media path add and git media path remove handle fully general .gitattributes files? Can they handle lines that have other (perhaps unrelated) attributes on them?
  • What if .gitattributes files are present at lower levels in the tree? Can git media path add and git media path remove still handle the situation somehow?
  • What .gitattributes files are used? Are they the ones in the working copy, or in the index, or HEAD, or what? I believe that this is determined by Git and not by git-media, but the answer could affect usability.

git media path is designed for Git Media. It may "accidentally" work with other attributes though. It currently only writes to the .gitattributes in the root directory, but can read other .gitattributes files.

  • How does git media init work? I imagine it adds configuration to the user's global Git config, because the docs show it being used before git init or git clone. Is this polite? E.g., will there be cases where the user wants different git-media configurations for different repositories?

It just adds the filter configuration, which should be the same for every repository. The actual git media endpoint can be configured locally or per repo, but actually transforms the url by default. For instance, cloning "https://github.com/github/git-media" will use "https://github.com/github/git-media.git/info/media" as the root endpoint.

  • In docs/spec.md, section "The Server", the precedence of the three places where the Git Media server is sought should be specified explicitly (I think the order is currently incorrect).

Yup, more docs are ๐Ÿ‘ What's incorrect? Are our docs incorrect, or do you have a different opinion on the precedence?

  • In the same section, the sample Git config file has a key media.endpoint. Should that be media.url? (Maybe that is old nomenclature that has since been changed?) The term "endpoint" also appears in the text.

I suppose that's a term I'm used to from working on APIs. We could use media.url.

  • It seems to me that the layout of git-media-related data in the local repository (i.e., the data under $GIT_DIR) needs to be documented to enable other Git implementations and tools to work with the data correctly.

๐Ÿ‘

  • Is it always guaranteed that the media files are pushed to the server before commits that use them? Are media files only pushed if they are used in the specific commits being pushed, or are all media blobs pushed even if they are used in other commits that haven't been pushed yet? How is it determined whether a media blob is already present on the server? (Is this verified every time a commit referencing the asset is pushed to the server?) What happens if the server URI is changed? Is there a way to migrate the media assets to a different media server?

Repositories get a pre-push hook that will run git media push before pushing to Git. Currently all media files are pushed. Nothing really removes them, so there are obvious issues with that now: #99, #82.

Our Git Media server verifies blobs, even if it already exists on the server. Though it's not required by the spec, it is important to ensure clients will be caught in any attempted lies.

There's no way to migrate assets, but that would be cool.

  • Is it possible to un-media-ize a file in a later commit without confusing earlier history?

Only if we can leave the pointer file unchanged.

  • Is it possible to discard media files from the local repo to save space (e.g., I only want the media assets associated with the current revision)? Is there a tool to help with this?

Technically it is possible. You'd have to replace a file with its pointer, and of course remove it from .git/media/objects. It should automatically re-download too. No tools yet though.

I'd also like to explore something like a Dropbox Selective Sync config or .gitmediaignore for git media clients.

  • Is it possible to GC media files from the remote media server if, for example, the commits referencing them have been discarded? Would other clients handle this situation gracefully?

I imagine this is up to the server implementation. Clients should handle the case that a remote object doesn't exist gracefully though.

  • Is there a tool analogous to git cat-file blob to access the contents of a media-ized file (downloading it if necessary) without having to check out the commit containing it? (I don't think that git cat-file respects smudge filters but I might be wrong.)

git media smudge can, but you have to pass the OID through STDOUT. I imagine we'll eventually add a --file option or add a git media cat command.

  • Is there any provision for sharding the local media asset cache directory? (If the number of assets becomes large, the performance might suffer from too many files in a single directory.)

Yes. A file with an OID of "17ab91cebd2972fa7f01b6e77b28c59d341534d1d8d6fffacb6f3ea9d3aabff4" is stored in ".git/media/17/ab/17ab91cebd2972fa7f01b6e77b28c59d341534d1d8d6fffacb6f3ea9d3aabff4"

  • Is there a reference implementation of a git media server?

Nope, but I want one.

  • Is there provision to compress the media files somehow (e.g., on disk and on the wire), or are they assumed to be incompressible? Can they be stored as deltas somehow? (Some large file formats delta well.)

Nope. Keeping it simple right now.

Regarding the API: Is it standard practice to use the Accept: header to distinguish between fetching the blob and the JSON metadata? (I would have expected that distinction to be encoded in the URI.)

For APIs, yes. The Git Media API is not designed for browsers.

from git-lfs.

rubyist avatar rubyist commented on May 18, 2024

To add a point about the .gitattributes file, git media will ignore and preserve any lines in the file that aren't related to git media when adding/removing paths. It currently looks for "filter=media" in the line, but I think we can tighten that up a little because that would also match "filter=mediafoo".

from git-lfs.

mhagger avatar mhagger commented on May 18, 2024
  • Can git media path add and git media path remove handle fully general .gitattributes files? Can they handle lines that have other (perhaps unrelated) attributes on them?
  • What if .gitattributes files are present at lower levels in the tree? Can git media path add and git media path remove still handle the situation somehow?
  • What .gitattributes files are used? Are they the ones in the working copy, or in the index, or HEAD, or what? I believe that this is determined by Git and not by git-media, but the answer could affect usability.

git media path is designed for Git Media. It may "accidentally" work with other attributes though. It currently only writes to the .gitattributes in the root directory, but can read other .gitattributes files.

I think that means that if I have a .gitattributes file in a subdirectory with *.mp3 filter=media, then git media path remove '*.mp3' won't affect files under that subdirectory.

Similarly, if I have *.mp3 -filter in a subdirectory, then `git media path add '*.mp3' won't affect that subdirectory.

These are probably reasonable limitations of the tool; I just wanted to point them out.

  • In docs/spec.md, section "The Server", the precedence of the three places where the Git Media server is sought should be specified explicitly (I think the order is currently incorrect).

Yup, more docs are ๐Ÿ‘ What's incorrect? Are our docs incorrect, or do you have a different opinion on the precedence?

I would expect remote.{name}.media to have the highest precedence, followed by media.url, and then the default rule of appending /info/media to the URL. I don't know whether your list is supposed to be in order of increasing or decreasing precedence, but either way it disagrees with my expectation.

  • In the same section, the sample Git config file has a key media.endpoint. Should that be media.url? (Maybe that is old nomenclature that has since been changed?) The term "endpoint" also appears in the text.

I suppose that's a term I'm used to from working on APIs. We could use media.url.

I don't have an opinion about which name is better; I am more concerned that this example seems to be inconsistent with the list a few lines earlier (which mentions media.url).

  • Is it always guaranteed that the media files are pushed to the server before commits that use them? Are media files only pushed if they are used in the specific commits being pushed, or are all media blobs pushed even if they are used in other commits that haven't been pushed yet? How is it determined whether a media blob is already present on the server? (Is this verified every time a commit referencing the asset is pushed to the server?) What happens if the server URI is changed? Is there a way to migrate the media assets to a different media server?

Repositories get a pre-push hook that will run git media push before pushing to Git. Currently all media files are pushed. Nothing really removes them, so there are obvious issues with that now: #99, #82.

OK, so if I understand correctly, as soon as a file is git added, it is queued to be uploaded to the media server, and the next time git media push is run, it is uploaded. The file will be uploaded even if it was never actually committed, or if no commit referencing it is ever pushed, or if the commit referencing it is deleted (e.g., via git rebase) before it is pushed. Once the file is on the media server, it is retained forever. Correct?

I sounds pretty easy to accumulate cruft on the git media server. I hope we plan to charge users by the GB :trollface: And it is a shame that making local commits is no longer a "cost-free" action. OTOH I guess that uploading big files will be somewhat painful, so users will probably take care not to add files before they are sure that they want them in the permanent record.

I suppose the alternative would be to make the queue smarter. For example, when a file is added, it could be put in a "pending" bucket along with the SHA-1 of the pointer file that refers to it. Upon push, the pre-push hook would check specifically what objects are about to be pushed. If any of the objects to be pushed are pointer files recorded in the "pending" bucket, then the corresponding media files would be pushed to the media server. Other files would stay in the "pending" bucket and not be pushed at that time.

It would also be possible to GC local media files that never got committed, or whose commits were dropped from the Git history. To do so, one would record, for each media file in the cache, the SHA-1(s) of the pointer file(s) that referenced it. Then a git media gc command could check whether the pointer file object is still present in the Git repository. If not, then the corresponding media file can be deleted.

(I say SHA-1(s) because it is possible for the pointer file that refers to a given media file to have different forms; for example, v1 and v2 form. So the mapping from pointer file SHA-1 to media file is many-to-one.)

  • Is it possible to discard media files from the local repo to save space (e.g., I only want the media assets associated with the current revision)? Is there a tool to help with this?

Technically it is possible. You'd have to replace a file with its pointer, and of course remove it from .git/media/objects. It should automatically re-download too. No tools yet though.

I'd also like to explore something like a Dropbox Selective Sync config or .gitmediaignore for git media clients.

Instead of replacing the file with its pointer then redownloading, one might want to make the "cleanup-cache" tool refuse to purge files corresponding to the currently-checked-out revision.

  • Is it possible to GC media files from the remote media server if, for example, the commits referencing them have been discarded? Would other clients handle this situation gracefully?

I imagine this is up to the server implementation. Clients should handle the case that a remote object doesn't exist gracefully though.

I am not sure that this would work, at least not without a lot of overhead.

Suppose I

  1. Create a branch, add a media file to it, then push the branch to the server.

-> My local repo thinks the file is at the server so it remotes the file from my queue.
2. Delete the branch from the server while retaining it locally:

git push origin --delete branch
  1. Run the hypothetical media-gc process on the server. The history no longer has a link to the media file, so it is deleted.
  2. Push the branch to the server again.

-> This time my client thinks the server already has a copy of the media file, so it doesn't send it again. The server probably doesn't look into the objects, so it doesn't know that a media file should have been uploaded for the branch. So already my colleagues cannot fetch my branch.
5. Delete my clone.

-> Now the public git history has a reference to the media file, but no copy of the file exists any more.

To avoid this, the client would have to verify, every time it pushes, that the server has a copy of all media files referred to by the objects being pushed. This is probably too much overhead.

  • Is there provision to compress the media files somehow (e.g., on disk and on the wire), or are they assumed to be incompressible? Can they be stored as deltas somehow? (Some large file formats delta well.)

Nope. Keeping it simple right now.

You might want to plan someplace in the local object cache to record this information, if it might someday be desired.

To add a point about the .gitattributes file, git media will ignore and preserve any lines in the file that aren't related to git media when adding/removing paths. It currently looks for "filter=media" in the line, but I think we can tighten that up a little because that would also match "filter=mediafoo".

Also, it looks like if you find a line with filter=media, you delete the whole line. But it might be that the line contains other, unrelated attributes, like

*.mp3 -crlf filter=media -text -diff

It would be rude to delete this whole line; instead it should be changed to

*.mp3 -crlf -text -diff

(i.e., only the one attribute should be removed).

from git-lfs.

technoweenie avatar technoweenie commented on May 18, 2024

Awesome feedback! I'm creating issues (either here or in the server implementation) as necessary.

I think that means that if I have a .gitattributes file in a subdirectory with .mp3 filter=media, then git media path remove '.mp3' won't affect files under that subdirectory.

Similarly, if I have .mp3 -filter in a subdirectory, then `git media path add '.mp3' won't affect that subdirectory.

It'd be nice if git media path was better. I think if people are working with multiple/nested .gitattributes files, they don't necessarily need it though.

I would expect remote.{name}.media to have the highest precedence, followed by media.url, and then the default rule of appending /info/media to the URL. I don't know whether your list is supposed to be in order of increasing or decreasing precedence, but either way it disagrees with my expectation.

Yeah, that sounds legit.

from git-lfs.

technoweenie avatar technoweenie commented on May 18, 2024

@mhagger: Had an idea about teaching git media push more about git:

(via #104 (comment))

Change push so it requires the remote and branch (which should be provided by the pre-push hook. The pre-push hook gets this in STDIN:

{local ref}       {sha} {remote ref}       {remote sha}
refs/heads/master 67890 refs/heads/foreign 12345

We can get this easily with:

$ git ls-remote -h origin master
b336a0f59a945b53259856e94cfb2440bfd5ca4e    refs/heads/master

from git-lfs.

gitfoxi avatar gitfoxi commented on May 18, 2024

There's no way to migrate assets, but that would be cool.

What happens when you fork a repository? Do the forks share a git-media path? Or does each fork need to maintain a copy of all media files separately on the server? Can that work with pull requests?

from git-lfs.

technoweenie avatar technoweenie commented on May 18, 2024

@gitfoxi Git LFS itself doesn't really know about forks or how GitHub (or whatever Git host you're using) works. However, GitHub's implementation of the Git LFS API shares objects across a GitHub repository network. Forking and pull requesting should work great :)

Also, this is a really old discussion about an ancient version of Git LFS. I'm closing this issue, but you're very welcome to open a new one if you want to discuss further.

from git-lfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.