Comments (4)
meanwhile -- is there an easy way to provide some default page/something which would instead say "Service unavailable due to X. Please try again in Y minutes. If still unavailable, check if a known issue on https://github.com/con/upptime/issues?q=is%3Aissue+hub+is%3Aopen . If not - file a new one" or alike?
When this flake is happening (IIRC) the hub pod is getting restarted. Typically for a highly available application, we'd be running 2 hub pods, as soon as hub 1 goes down, hub 2 picks up the traffic. However, since the hub pod is controlled by the jupyterhub helm chart .(and it would cost extra). I don't think thats how we want to handle it.
There might be something we can do with the Amazon side, maybe we could setup a health check and a default page during an outage? From a super quick search and gpt, I think it would work like this: health check fails, R53 changes the routing to a default page, then DNS would have to propagate, then the user gets the helpful message. But then, when the hub comes back up, I guess it would do the same thing in reverse, also with a propagation delay, so it could possibly extend the downtime?
from dandi-hub.
But can someone remind me why this 503 is happening at all (i.e. what stops that pod) or is that by design (non persistent, then why so)?
If we do restart upon hitting 503, could we partially mitigate (i.e. shorten duration and thus potentially make less likely to be hit) but hitting the service with "hearbeat" e.g. every second?
from dandi-hub.
But can someone remind me why this 503 is happening at all (i.e. what stops that pod) or is that by design (non persistent, then why so)? If we do restart upon hitting 503,
I haven't looked into it since the problem will "just go away" soon, but I think satra is aware of the reason?
could we partially mitigate (i.e. shorten duration and thus potentially make less likely to be hit) but hitting the service with "hearbeat" e.g. every second?
I don't understand what you mean. As soon as the pod fails, the k8s deployment restarts the pod. (I'm assuming the 503 is the result of that pod being deleted and not whatever the error is). So I dont see how hitting the service every second could help?
from dandi-hub.
I probably have just misread your
When this flake is happening (IIRC) the hub pod is getting restarted.
as sensing the service and causing 503 is the event which triggers a new pod to start, not that it was already "getting started" before that event.
from dandi-hub.
Related Issues (20)
- Configure EFS Lifecycle
- Create maintenance policy
- Consider refactoring the DoEKS implementation with eksctl
- [do-eks] ./cleanup.sh failed HOT 2
- Docker images (GPU, MATLAB, GPU+MATLAB) are failing to build
- Migrate existing home dir data to staging HOT 6
- Push to production
- Optimize server start up time HOT 1
- Provision infrastructure with Terraform Cloud HOT 1
- Minimum node count in EKS clusters in production does not reflect Terraform value set in do_eks setup HOT 1
- Save Terraform state to S3 bucket HOT 1
- [BICAN] Upgrade k8s (1.27 is EOL July 24)
- upgrade pynwb version
- FEATURE REQUEST: Configure SSH Server in dandi-hub (staging-hub) HOT 2
- Add user facing message to user-hubs
- Configure spot vs on_demand in tfvars
- Explore options for providing users with on-demand resources HOT 1
- JupyterHub intermittently *freezing* and asking to be *restart notebook* HOT 1
- Collect user-pod logs persistently
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dandi-hub.