Coder Social home page Coder Social logo

Avoid "Service unavailable" about dandi-hub HOT 4 OPEN

yarikoptic avatar yarikoptic commented on July 17, 2024
Avoid "Service unavailable"

from dandi-hub.

Comments (4)

asmacdo avatar asmacdo commented on July 17, 2024

meanwhile -- is there an easy way to provide some default page/something which would instead say "Service unavailable due to X. Please try again in Y minutes. If still unavailable, check if a known issue on https://github.com/con/upptime/issues?q=is%3Aissue+hub+is%3Aopen . If not - file a new one" or alike?

When this flake is happening (IIRC) the hub pod is getting restarted. Typically for a highly available application, we'd be running 2 hub pods, as soon as hub 1 goes down, hub 2 picks up the traffic. However, since the hub pod is controlled by the jupyterhub helm chart .(and it would cost extra). I don't think thats how we want to handle it.

There might be something we can do with the Amazon side, maybe we could setup a health check and a default page during an outage? From a super quick search and gpt, I think it would work like this: health check fails, R53 changes the routing to a default page, then DNS would have to propagate, then the user gets the helpful message. But then, when the hub comes back up, I guess it would do the same thing in reverse, also with a propagation delay, so it could possibly extend the downtime?

from dandi-hub.

yarikoptic avatar yarikoptic commented on July 17, 2024

But can someone remind me why this 503 is happening at all (i.e. what stops that pod) or is that by design (non persistent, then why so)?
If we do restart upon hitting 503, could we partially mitigate (i.e. shorten duration and thus potentially make less likely to be hit) but hitting the service with "hearbeat" e.g. every second?

from dandi-hub.

asmacdo avatar asmacdo commented on July 17, 2024

But can someone remind me why this 503 is happening at all (i.e. what stops that pod) or is that by design (non persistent, then why so)? If we do restart upon hitting 503,

I haven't looked into it since the problem will "just go away" soon, but I think satra is aware of the reason?

could we partially mitigate (i.e. shorten duration and thus potentially make less likely to be hit) but hitting the service with "hearbeat" e.g. every second?

I don't understand what you mean. As soon as the pod fails, the k8s deployment restarts the pod. (I'm assuming the 503 is the result of that pod being deleted and not whatever the error is). So I dont see how hitting the service every second could help?

from dandi-hub.

yarikoptic avatar yarikoptic commented on July 17, 2024

I probably have just misread your

When this flake is happening (IIRC) the hub pod is getting restarted.

as sensing the service and causing 503 is the event which triggers a new pod to start, not that it was already "getting started" before that event.

from dandi-hub.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.