Comments (21)
Reproducing seems somewhat dependent on size. In limited testing, bigger files are more likely to hit. I added LFS_DEBUG_HTTP=1
and GIT_CURL_VERBOSE=1
and now can see details about the failure from the client side. File attached. Lots of 500 Internal Server Errors, so this looks like an lfs-test-server issue the more I dig.
lfs-test-server_failure.txt
from lfs-test-server.
Yeah, as you've noticed, the "Fatal error: Server error:" response indicates a 500 error. I'm not sure why the test server is producing that in this case, but it does seem to be an issue there. I'm afraid that the test server doesn't get a huge amount of attention, but that could be either because it generally works or nobody's using it, I'm not sure which.
I'm going to transfer this issue over to that repository.
from lfs-test-server.
Great, thanks for moving it for me and the info. I'll be digging into lfs-test-server. I think there is probably significant underestimated demand for better versioning of large files, and the lfs-test-server running locally is an attractive and obvious answer. Granted, if it doesn't actually work, I guess it's not such a great answer ;-)
from lfs-test-server.
I think the most important thing to look into is what error message you're getting back from the server. That will probably tell us a lot about what problem is occurring server side.
It may be helpful to try GIT_TRANSFER_TRACE=1
if you haven't already. That's turned on automatically by GIT_CURL_VERBOSE
in newer versions (I think 2.6.0 and newer), but not in older versions, which don't know about GIT_CURL_VERBOSE
.
from lfs-test-server.
So in the lfs-test-server-failure.txt, the gist is that the PUT for the giant file is processed. Progress is shown and the full file size is reached. But then the server responds with a 500 instead of the usual 200. Then the client repeatedly tries again and each retry is immediately failed until the client gives up.
Time to learn the minimum go required to debug this and dig into the lfs-test-server. Any tips for getting any debug output from the test server?
from lfs-test-server.
I believe you'd probably want to instrument https://github.com/git-lfs/lfs-test-server/blob/master/server.go#L357 and have it print the error message it's producing to standard error. That would look like so (you'll need to import os
at the top):
fmt.Fprintf(os.Stderr, `{"message":"%s"}`, err)
If you can determine what that error message is, that would tell you where the problem is occurring.
from lfs-test-server.
Great, thanks. An interesting tidbit on the server side now is that it prints out each request and status. It thinks all of them are succeeding. There are a lot of bunched object /objects/batch requests after the PUT. But the logs show status=200 on the server side. Will see what I can get with the instrumentation you suggested.
Also, I tried both lfs.concurrenttransfers=1
and lfs.basictransfersonly=1
just on some wild hunches, but no change in results.
from lfs-test-server.
So the PutHandler is exiting normally. So the reason for the failed connection is somewhere else. I'll keep digging. I'm also on a super old version of go (18.04LTS by default gets you 1.6.2) so I'll probably update that to the latest for good measure first.
from lfs-test-server.
I don't know if you intended to write "18.04" or "16.04", but 18.04 does have the golang-1.10
package, which should be sufficient to build and test the LFS server. It should also be present in xenial-backports
, if you're using 16.04.
from lfs-test-server.
Oops, sorry about that. 16.04. I'll grab the backport.
from lfs-test-server.
Getting closer. When an LFS push is normal and succeeds on the first time, the sequence of events is:
Client: Batch Operation Request to get URL for uploading file
Server: Batch Operation Response with upload URL
Client: PUT Request to upload URL
Server: PUT Response
When the LFS push is for a big file that fails, the sequence is:
Client: Batch Operation Request to get URL for uploading file
Server: Batch Operation Response with upload URL
Client: PUT Request to upload URL
Client: Batch Operation Request to get URL for uploading file
Server: Batch Operation Response with upload URL
Client: PUT Request to upload URL (which fails)
So for some reason, git-lfs issues multiple BatchOperation requests to upload the file if the upload of the file takes "too long", and repeatedly tries to upload it again and fails. When the second push succeeds, the client/server messaging is the same until the second batch operation request to get URL for uploading file; at that point, the response actually contains a download URL and no upload URL, so a second PUT does not occur.
I'll keep digging, but even if this is an lfs-test-server issue, it seems bizarre that the git-lfs client issues all of these BatchOperation requests to try to upload the file again BEFORE the response for the initial PUT is even received.
from lfs-test-server.
When an upload fails, it can be for a number of reasons, one of which is that the authentication credentials expired. When we retry a request, we issue a new batch request, which will provide us with a new set of credentials if required. (Even if the reason wasn't that the credentials expired, they might expire soon, and we'd want fresh ones to maximize our success potential.)
We do this asynchronously (in a goroutine), so there's probably some reason that we're finding the initial PUT request is failing, perhaps a timeout of some sort.
from lfs-test-server.
Got it, thanks. So turning the retries down to 1 with lfs.transfer.maxretries=1 eliminated a lot of the noise and makes things a bit clearer. Just like you said, the PUT request is failing for some reason. The lfs-test-server thinks it has succeeded from its side of logging, but the client shows no response from the initial put request (the 500 errors are for the subsequent put attempts).
Whatever the failure is, it is at the very end. When the second push succeeds, it is quite wonky. There is no response listed on the client side to the PUT, but the subsequent batch operation doesn't result in another put attempt, so there is no error message... I think I'm going to put wireshark between the two and try to get a clearer picture of who is in the wrong.
from lfs-test-server.
Got wireshark going and understand the problem now.
At about 90 seconds, the last bytes of the file are sent from the client. At 120 seconds, the BatchRequest is sent, and at 150 seconds, the response to the PUT is sent, but the client is no longer listening for it.
There is a delay between when the last file bytes are sent and when the lfs-server has fully processed them and placed the file on disk and a response is sent. For our lfs-test-server, the lfs storage is on a spinning-disk NAS. When all is said and done, this delay is about 60 seconds for our 9GB file.
This is longer than the default lfs.activitytimeout of 30 seconds. Bumping it to 120 resolves the problem for the 9GB file. Setting to 0 (unlimited) would resolve for any size.
I'll look into Monday if there is anything simple that could be done in the lfs-test-server to avoid the long delay where no network traffic is present.
from lfs-test-server.
Yeah, that makes a lot of sense. We're probably timing out due to the close, which we need to be sure to do so that it's on disk properly. I don't know of anything that we could do differently, but if you think of anything, I'm definitely open to suggestions.
from lfs-test-server.
Was typing up my findings when you responded with what I would find. :-)
Spent a bit of time investigating this, and I'm not sure there is a better solution than turning the timeout up or to 0. The big hang with no network activity between the lfs client and server is indeed on the file close. The close has to happen, and there's no easy way to get around it. I could artificially throttle the file write (knowing that it is to a NAS) which would decrease the time the close takes, but there's not a generic way to do that that would make sense for lfs-test-server users. Maybe there are things on my NAS that I could tweak to have it slow uploads to it down to a rate it can keep up with better.
I think there's enough here to close as not a bug. For our use case, I'll think about whether a NAS for a content store makes sense.
Thanks @bk2204 for all the help troubleshooting and debugging this. Really appreciated. :-)
from lfs-test-server.
You're very welcome. I'm glad we got to the bottom of your problem, even if it wasn't as easily solved as I'd hoped.
from lfs-test-server.
I'm sure a lot of that had to do with my newness to both go and the lfs-test-server more than anything else. Now I'm better prepared. :-)
from lfs-test-server.
Related Issues (20)
- getting segmentation violation linux HOT 3
- Store locks on same server for different repos HOT 1
- tusd binary versus importing ? HOT 1
- hi gaizz need help using git lfs trying to upload games to my git pages website HOT 1
- No connection could be made because the target machine actively refused it HOT 1
- How to you set it up HOT 3
- LFS:Put connexion refused HOT 1
- Panic when running latest Mac.AMD64.gz on macOS 10.13.4 (17E202) HOT 10
- Git lfs ignores lfs url and doesn't push or upload to lfs server HOT 2
- invalid character "\\" in host name HOT 8
- No binaries for v0.4.0 HOT 1
- Return 413 on excessively large batch
- lfs-test-server tricks client into violation of LFS spec HOT 6
- Kept getting 401 error on Windows without environment variables HOT 5
- Large file uploads causing 500 error? HOT 3
- > Doisjs
- File lock failed: Repository or object not found HOT 1
- go install does not work HOT 4
- verbose logging HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lfs-test-server.