Coder Social home page Coder Social logo

Comments (20)

SezBulent avatar SezBulent commented on May 11, 2024 3

@fgimenez i think i finally figured it out.

Did a bit more digging, and saw that the current Dockerfile gets built on top of this one:

https://github.com/rpardini/nginx-proxy-connect-stable-alpine/blob/nginx-1.18.0-alpine-3.12.1/Dockerfile

Analyzing a bit the content of it, and also checking the current nginx, saw that it was compiled for the http_proxy_connect_module - which that's where the "issue" appears.

This module, has 3 more timeout directives, where apparently, the read and send timeouts are set to a default of 60 seconds.

     proxy_connect_connect_timeout
     proxy_connect_read_timeout
     proxy_connect_send_timeout

Here's the documentation:
https://github.com/chobits/ngx_http_proxy_connect_module#proxy_connect_connect_timeout

Increasing these ones to a higher value, seems to resolve the problem. Managed to push an 2.2 GB image without any timeout occuring.

@rpardini i think it would be a good idea if we could also specify some of the nginx configs, as timeouts for example, as environment variables? If you agree, i can drop in a few days a pull request.

from docker-registry-proxy.

fgimenez avatar fgimenez commented on May 11, 2024 1

I'm still getting the same timeouts with 0.6.2 trying to push big layers

Setting higher values for timeouts through the env vars all is working fine 👍

from docker-registry-proxy.

rpardini avatar rpardini commented on May 11, 2024

Thanks for the detailed report. ALLOW_PUSH=true is new in #59, by @fgimenez -- maybe you can work together on this.

from docker-registry-proxy.

SezBulent avatar SezBulent commented on May 11, 2024

Yep, i checked the commits before using it as it was not present in the readme, and saw that indeed it is a new feature.
I wanted to add that by checking docker logs when running debug mod, i also found the following, which is related to this issue:

.......
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:04.970405366Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=151
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:04.970428398Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=152
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:04.970454513Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=153
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:04.970478241Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=154
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:04.970504699Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=155
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:04.970546552Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=156
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:04.970581228Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=157
Dec 04 10:20:05 test-instance dockerd[30991]: time="2020-12-04T10:20:05.113662779Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=0
Dec 04 10:20:06 test-instance dockerd[30991]: time="2020-12-04T10:20:06.945688409Z" level=warning msg="got error while decoding json" error="unexpected EOF" retries=0

from docker-registry-proxy.

fgimenez avatar fgimenez commented on May 11, 2024

@SezBulent thanks for raising this issue, It is very interesting that you get the 401 on a GET request, which is the only cached method when ALLOW_PUSH is set to true. I personally haven't found this issue so far pushing to quay.io, will try to reproduce locally and report back here.

from docker-registry-proxy.

SezBulent avatar SezBulent commented on May 11, 2024

Hello again,

Tried different configurations on nginx, increasing timeouts and also increasing worker_connections and worker_rlimit_nofile to 65535 ( also increased the ulimit -Sn and ulimit -Hn on the host ), and it didn't seem to help at all.

Since the builds and pushes are ran inside docker ( using gitlab runners ), i also added on docker daemon.json "max-concurrent-uploads": 1 thinking might be a network issue, but again with no results.

What I did not mention, and it is pretty important, is that this issue with "EOF" occurs when running multiple builds/pushes ( more than 5 ) through the proxy.

from docker-registry-proxy.

fgimenez avatar fgimenez commented on May 11, 2024

I've been able to reproduce trying to push an image with several big layers (above 500Mb), it starts retrying them and I finally get an EOF.

I get these messages in the proxy log before each retry:

2020/12/14 11:15:08 [error] 71#71: *108 proxy_connect: upstream read timed out (peer:127.0.0.1:443) while connecting to upstream, client: 172.17.0.1, server: _, request: "CONNECT quay.io:443 HTTP/1.1", host: "quay.io:443"

Eventually I also get:

2020/12/14 11:03:02 [warn] 73#73: *673 upstream server temporarily disabled while reading response header from upstream, client: 127.0.0.1, server: _, request: "PATCH /v2/kubevirt/kubekins-e2e/blobs/uploads/98a50373-1fb3-4677-bbd8-8639e63c3f38 HTTP/1.1", upstream: "https://52.22.99.116:443/v2/kubevirt/kubekins-e2e/blobs/uploads/98a50373-1fb3-4677-bbd8-8639e63c3f38", host: "quay.io"

I've tried to bump proxy_read_timeout, proxy_connect_timeout and proxy_send_timeout, but doesn't seem to affect when the retry is triggered, will keep trying.

@SezBulent can you confirm if you get these errors pushing big layers?

from docker-registry-proxy.

SezBulent avatar SezBulent commented on May 11, 2024

Hm, indeed the problem is more frequent on images with bigger layers, but did not find any similar error to the ones you found.

from docker-registry-proxy.

fgimenez avatar fgimenez commented on May 11, 2024

Looks like in my case the problem is related to this location directive https://github.com/rpardini/docker-registry-proxy/blob/master/nginx.conf#L279-L283 with it in place the proxy_read_timeout setting in the server block is completely ignored and I always get a timeout exactly after 60s:

2020/12/14 14:47:13 [warn] 71#71: *148 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000005, client: 127.0.0.1, server: _, request: "PATCH /v2/kubevirt/kubekins-e2e/blobs/uploads/3f3b1e8e-1893-404d-933d-0d080c132927 HTTP/1.1", host: "quay.io"
2020/12/14 14:48:13 [error] 71#71: *146 proxy_connect: upstream read timed out (peer:127.0.0.1:443) while connecting to upstream, client: 172.17.0.1, server: _, request: "CONNECT quay.io:443 HTTP/1.1", host: "quay.io:443"

If I comment out https://github.com/rpardini/docker-registry-proxy/blob/master/nginx.conf#L279-L283 all goes fine with proxy_read_timeout 900s; in the server block. Looks like this block is resetting some of the general settings, not sure if this applies also to your authentication problem @SezBulent can you please give it a try?

from docker-registry-proxy.

SezBulent avatar SezBulent commented on May 11, 2024

Thank for the details @fgimenez.

I commented the specific block you mentioned, and also added proxy_read_timeout 900s; in each location block from the cache "server" block, but didn't resolve the timeout. I measured as well and i get the timeout after 60s.

Did the test by pushing an image myself, and not through the gitlab runner, and found the same logs as you did:

2020/12/16 00:05:29 [warn] 60#60: *181 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000010, client: 127.0.0.1, server: _, request: "PATCH /v2/test/bulent-test/ubuntu-layer-test/blobs/uploads/04759132-8c45-4dc9-b9d7-10b79bc373d1?_state=rioY6lcS0gvIT8UgLsxhQx8PzIMmJc03Qm5aKgfyAER7Ik5hbWUiOiJvbWZlL2J1bGVudC10ZXN0L3VidW50dS1iaWctbGF5ZXItdGVzdCIsIlVVSUQiOiIwNDc1OTEzMi04YzQ1LTRkYzktYjlkNy0xMGI3OWJjMzczZDEiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjAtMTItMTZUMDA6MDU6MjkuODQ2MjIzNDM5WiJ9 HTTP/1.1", host: "test-registry.test.com"


2020/12/16 00:06:29 [error] 59#59: *179 proxy_connect: upstream read timed out (peer:127.0.0.1:443) while connecting to upstream, client: 10.3.249.100, server: _, request: "CONNECT test-registry.test.com:443 HTTP/1.1", host: "test-registry.test.com:443"

Seems that indeed there is a default timeout of 60s; will verify nginx docs to see what seems to be missing, and also check the network.

from docker-registry-proxy.

fgimenez avatar fgimenez commented on May 11, 2024

@SezBulent you are right, commenting out https://github.com/rpardini/docker-registry-proxy/blob/master/nginx.conf#L279-L283 in fact didn't help, the successful executions I got were just bc the uploads happened to last less than 60s. Will keep trying to understand where this timeout comes from.

from docker-registry-proxy.

SezBulent avatar SezBulent commented on May 11, 2024

Something weird is going on, most certainly is something that i am missing. Added the following timeouts in nginx config, and i still get my connection cut after 60s ( to be sure that nothing is replaced at docker startup, i modified the entrypoint and removed most of the nginx config from there ):

    keepalive_timeout 900s;
    proxy_read_timeout 1800s;
    proxy_connect_timeout 1800s;
    proxy_send_timeout 1800s;
    send_timeout 1800s;
    client_body_timeout 1800s;

and this one in the http block:

    client_header_timeout 1800s;

Even included them in every location block to be sure nothing is overwriting my settings ( not the "client_header_timeout" directive, as this one can be specified only in the http or server block ).

Also, by renaming the server_name from each block, based on the comments near each server block ( i changed the server name of the director layer to _director, and the one of the caching layer to _caching ), it seems that the error is generated on the _director server:

2020/12/17 01:14:14 [error] 58#58: *86 proxy_connect: upstream read timed out (peer:127.0.0.1:443) while connecting to upstream, client: 10.3.249.100, server: _directory, request: "CONNECT test-registry.test.com:443 HTTP/1.1", host: "test-registry.test.com:443"

2020/12/17 01:14:20 [info] 60#60: *137 client prematurely closed connection, client: 127.0.0.1, server: _caching, request: "PATCH /v2/test/bulent-test/ubuntu-layer-test/blobs/uploads/08ffc7f3-48bc-4a8c-b9f5-f2d363d0bfd0?_state=0pAPdqLcDu2_cgAJc0lS0xBK9CCsfgMn1m8DZ-lzcgF7Ik5hbWUiOiJvbWZlL2J1bGVudC10ZXN0L3VidW50dS1iaWctbGF5ZXItdGVzdCIsIlVVSUQiOiIwOGZmYzdmMy00OGJjLTRhOGMtYjlmNS1mMmQzNjNkMGJmZDAiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMjAtMTItMTdUMDE6MTQ6MjAuNjMwNjExMjMxWiJ9 HTTP/1.1", host: "test-registry.test.com"

And as a side note, the client request body buffer warn can be resolved by adding "client_body_buffer_size 1000M;" ( or whatever value seems appropriate )

from docker-registry-proxy.

fgimenez avatar fgimenez commented on May 11, 2024

Awesome, thanks a lot, for the record i also tried increasing open_file_cache_valid, the only remaining setting which default value is 60s, no luck. And very interesting to know that the error comes from the _director server.

@rpardini wdyt, any idea about what might be causing the error?

from docker-registry-proxy.

fgimenez avatar fgimenez commented on May 11, 2024

@SezBulent awesome! looking forward to your PR, i've tried adding

proxy_connect_connect_timeout 1800s;
proxy_connect_read_timeout 1800s;
proxy_connect_send_timeout 1800s;

to the _director server block and I get a consistent 95s connect timeout when pushing two 1Gb layers.

from docker-registry-proxy.

SezBulent avatar SezBulent commented on May 11, 2024

Hm, I think you must also add them in the caching server layer, I have personally added them into the http block ( it works, even if in the docs the specified context is "server" ). Also, alongside those ones, i think its a good idea to have these ones specified as well - with any value you find appropriate:

    proxy_read_timeout 1800s;
    proxy_connect_timeout 1800s;
    proxy_send_timeout 1800s;
    send_timeout 1800s;
    client_body_timeout 1800s;
    client_header_timeout 1800s;
    keepalive_timeout 900s;

What i have noticed though, is that when the completion bar is full, and the layers seem to be pushed, docker still hangs for a few minutes ( between 5 and 10 mins ), after which it will output the message that the image was pushed. Its like it's still doing some things in the background, but docker debug log didn't generate anything during this time, and neither the proxy ones.

Also as a side note, i'm still getting the random 401 status error, yet at a first glance, it doesn't seem to affect anything.

from docker-registry-proxy.

rpardini avatar rpardini commented on May 11, 2024

Hmm, interesting investigations, do go ahead with PR, just keep in mind that push support is new, experimental, and as such any changes in those timeouts should be hidden behind ENV vars (maybe the same ALLOW_PUSH we already have).

The timeouts protect clients (and the proxy itself) from misbehaving servers/clients, and disabling them for all users would be bad.

Also I think the 401 issue has to be addressed as well, since that happens early and can change the behaviour of following requests...

from docker-registry-proxy.

SezBulent avatar SezBulent commented on May 11, 2024

Hello,

Sorry for the delay, since the holidays came, and then work popped up, I couldn't find some time to add the changes.
For somewhat reason I can not seem to be able to link the pull request to this issue, so here it is:

#73

from docker-registry-proxy.

fgimenez avatar fgimenez commented on May 11, 2024

I'm still getting the same timeouts with 0.6.2 trying to push big layers, for instance with an image like this:

FROM alpine
                                                                                                   
RUN head -c 1073741824 </dev/urandom >file.txt
                                         
RUN head -c 1073741824 </dev/urandom >file2.txt

I'll try to dig deeper to understand what's going on and maybe report a different issue for tracking.

from docker-registry-proxy.

saces avatar saces commented on May 11, 2024

I had also trouble pushing... long story short, request buffering was the cause here.

#125 fixed it for me.

from docker-registry-proxy.

scottcarey avatar scottcarey commented on May 11, 2024

Try as I might, I can not get push to work. I have tried setting all timeout values to 900s, and disabling buffering using the fix in #125 and rpardini/docker-registry-proxy@sha256:9c3df0f61f4fe377328a2111ee72f291513f7723ceb6fa943e534a62ba1d0d7f (latest, at this time, since there is no recent tag or release). I always get a chain of retries and failure as described here.

Another way to fix this would be if docker itself made it easy to bypass the proxy for push, but its proxy support is honestly rather minimal and awkward, and I don't expect them to enhance anything in this area.

from docker-registry-proxy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.