Coder Social home page Coder Social logo

jupyterhub / zero-to-jupyterhub-k8s Goto Github PK

View Code? Open in Web Editor NEW
1.5K 57.0 775.0 12.19 MB

Helm Chart & Documentation for deploying JupyterHub on Kubernetes

Home Page: https://zero-to-jupyterhub.readthedocs.io

License: Other

Python 46.50% Shell 6.39% Go 5.26% Smarty 36.59% Dockerfile 5.25%
jupyterhub jupyter-notebook jupyter kubernetes kubernetes-cluster kubespawner jupyterhub-deployment

zero-to-jupyterhub-k8s's Introduction

Zero to JupyterHub with Kubernetes

Documentation build status GitHub Workflow Status - Test GitHub Workflow Status - Vuln. scan Latest stable release of the Helm chart Latest pre-release of the Helm chart Latest development release of the Helm chart
GitHub Discourse Gitter Contribute

This repo contains a Helm chart for JupyterHub and a guide to use it. Together they allow you to make a JupyterHub available to a very large group of users such as the staff and students of a university.

The guide

The Zero to JupyterHub with Kubernetes guide provides user-friendly steps to deploy JupyterHub on a cloud using Kubernetes and Helm.

The guide is complemented well by the documentation for JupyterHub.

The Helm chart

The JupyterHub Helm chart lets a user create a reproducible and maintainable deployment of JupyterHub on a Kubernetes cluster in a cloud environment. The released charts are made available in our Helm chart repository.

Notice of Participation in Study

Please note that this repository is participating in a study into sustainability of open source projects. Data will be gathered about this repository for approximately the next 12 months, starting from 2021-06-11.

Data collected will include number of contributors, number of PRs, time taken to close/merge these PRs, and issues closed.

For more information, please visit the informational page or download the participant information sheet.

History

Much of the initial groundwork for this documentation is information learned from the successful use of JupyterHub and Kubernetes at UC Berkeley in their Data 8 program.

Acknowledgements

Thank you to the following contributors:

  • Aaron Culich
  • Carol Willing
  • Chris Holdgraf
  • Erik Sundell
  • Ryan Lovett
  • Yuvi Panda
  • Laurent Goderre

Future contributors are encouraged to add themselves to this README file too.

Licensing

This repository is dual licensed under the Apache2 (to match the upstream Kubernetes charts repository) and 3-clause BSD (to match the rest of Project Jupyter repositories) licenses. See the LICENSE file for more information!

zero-to-jupyterhub-k8s's People

Contributors

alexmorreale avatar allanlwu avatar arokem avatar betatim avatar bitnik avatar cam72cam avatar carreau avatar choldgraf avatar clkao avatar consideratio avatar dependabot[bot] avatar derrickmar avatar georgianaelena avatar gunjanbaid avatar jupyterhub-bot avatar manics avatar manycoding avatar minrk avatar pre-commit-ci[bot] avatar ryanlovett avatar saladraider avatar samlau95 avatar sgibson91 avatar sieboldianus avatar summerswallow avatar tmshn avatar tonyyanga avatar web-flow avatar willingc avatar yuvipanda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zero-to-jupyterhub-k8s's Issues

Accessing the JupyterHub api

Can you provide an example of how to access the JupyterHub API, especially which credential is required (i.e., of the two in the config.yaml file). I've tried many vairiations (using httpie) and get something like this:

$ http http://jupyterhub.odewahn.com/hub/api/users 'Authorization: token 1dfa277014fac9f3d054e203325e8e44c80432c8c8f8bfafd8f2ac1603c3458d'

HTTP/1.1 403 Forbidden
Connection: keep-alive
Content-Length: 39
Content-Security-Policy: frame-ancestors 'self'; report-uri /hub/security/csp-report
Content-Type: application/json
Date: Wed, 07 Jun 2017 19:53:46 GMT
Server: nginx/1.10.0 (Ubuntu)

{
    "message": "Forbidden", 
    "status": 403
}

Document common hardware extensions

We should have sample helmcharts for common deployments so that users know the parameters for controlling their nodes. E.g., right now I can't find any place that tells users how to turn off persistent storage, even though we reference the existence of persistent storage in the docs.

Complete section on customizing environment of users & resources available to them.

This is currently the extending-jupyterhub.rst, which we should split up and re-format.

Following ToC (adapted from #67):

  • User environment
    • Using an existing docker image
    • Creating a new image (repo2docker)
    • Setting environment variables
  • User resources
    • Memory limits & guarantees
    • CPU limits & guarantees
    • Storage
      • Enabling & disabling
      • Changing size available
      • SSD vs HDD

Advise to create/delete gcloud project.

When using Gcloud, once done, even after deleting cluster all associated network rules, disk...etc are still around.

I would suggest to create/delete a project to make sure to have a clean slate everytime.

General Restructuring

Thoughts on what the overall structure of the document should be.

Step 1: Creating a Kubernetes cluster

This is the first step, and should contain different pages for common ways to do this in the cloud (Google, Azure, maybe AWS?) and links to more detailed guides elsewhere (there are tons).

All the cloud specific stuff (setting up machines, resizing, etc) should be contained in this.

Step 2: Installing Helm

Should be fairly simple, contain instructions on installing helm & link up for more info

Step 3: Installing JupyterHub

Should setup a simple Hub, with a config.yaml.

Step 4: Extending JupyterHub

This should contain info on doing helm upgrades, and specific sections on the various things you can do (custom image, prebuilt image, authenticators, memory / CPU limits, etc).

How to keep same numeric IP?

Please document how to reinstall the software from scratch and keep the current numeric
IP address for the entry point. This is important for programmers like me who are not allowed
to touch the DNS records but may need to restart the system with extreme prejudice.

Decide where reference documentation about helm-chart lives

This document is mostly narrative, but it is also the only reference documentation we have. We should ideally have both - reference talking about each of the individual options you can tweak, and also more generic narrative docs that guide people.

There might also be differences in who is best positioned to write these docs. Reference docs should ideally be extracted from the code it is documenting, so it is as up to date & accurate as possible, while that doesn't work for narrative docs.

We probably should have reference docs for the helm-chart that's different from the narrative docs, and the narrative docs should link liberally to it - this helps us keep the narrative docs not too long and laborious.

We should figure out:

  1. How to write and extract reference docs
  2. Where reference docs should live
  3. How to link narrative docs to reference docs.

Add a different logo?

Should we change the logo? I just threw that one together quickly because I needed to put something there...is there any "official" jupyterhub logo? And if not, we should have one! :)

Support multiple Docker builds?

It would be nice to be able to have a choice of several docker images to launch as is done in Docker.
This is probably in the plan since Binder-esque functionality is a goal...
At the moment I'm collecting a fair number of repositories, including some largish data sets
into a single image. This will eventually break as things get larger. I would rather not have a separate hub installation for each image, although that is a feasible option.

culling doesn't work for me yet

The culling mechanism is not culling servers for me.
I see this for "get pod":

cull-deployment-1750684654-bqlf1 0/1 CrashLoopBackOff 1104 3d

And in logs

kubectl --namespace=test logs cull-deployment-1750684654-bqlf1 Traceback (most recent call last): File "/srv/cull/cull_idle_servers.py", line 93, in <module> loop.run_sync(cull) File "/usr/local/lib/python3.6/site-packages/tornado/ioloop.py", line 457, in run_sync return future_cell[0].result() File "/usr/local/lib/python3.6/site-packages/tornado/concurrent.py", line 237, in result raise_exc_info(self._exc_info) File "<string>", line 3, in raise_exc_info File "/usr/local/lib/python3.6/site-packages/tornado/gen.py", line 1021, in run yielded = self.gen.throw(*exc_info) File "/srv/cull/cull_idle_servers.py", line 42, in cull_idle resp = yield client.fetch(req) File "/usr/local/lib/python3.6/site-packages/tornado/gen.py", line 1015, in run value = future.result() File "/usr/local/lib/python3.6/site-packages/tornado/concurrent.py", line 237, in result raise_exc_info(self._exc_info) File "<string>", line 3, in raise_exc_info tornado.httpclient.HTTPError: HTTP 403: Forbidden

Complete section on resource management + costs

This section would focus on how to manage costs, think about costs and control costs.

1.0 version of guide ToC

@willingc, @minrk and @yuvipanda hashed out the contents of the ToC for a v1 of the guide (and the helm-chart). Raw Dump here. Let's split these into issues soon.

  • Document the target Audience

    • Released version of helm chart

    • Command line, Docker, Kubernetes, GitHub familiarity

    • Test, Staging, Production workflow recommended

    • Default user database: sqlite (maybe have offline migrations - stop, dump, reload; destroying user servers and recreating db)

    • Expectation that someone familiar with or excited to learn Kubernetes administration and monitoring is needed at times (GUI or CLI)

    • Backups (default backup scheme - on for hub db - if using specific cloud providers - snapshot once a day of last 3 days, last week, last month)

    • Out of document scope

    • Using a developer preview version

    • Alternative databases PostgreSQL, mysql

  • Add a Flowchart of process

  • Prerequisites and Setting up a Deployment platform (suggestions on better wording - welcomed)

    • Creating a Kubernetes Cluster

    • Setting up Kubernetes on Google Cloud

    • Setting up Kubernetes on Microsoft Azure Container Service (ACS) (TODO: Add a note that Azure support is alpha)

    • OpenShift? (TODO)

    • Setting up Helm

    • Installation

    • Initialization

    • Next Step

***** From here down should be cloud provider agnostic (other than persistent and paid storage *****

  • Hub - Getting started (Installation Guide for a test hub and/or a one-three day workshop)

    • Setting up JupyterHub

    • Prepare configuration file

    • Install JupyterHub

    • Setting up public access

    • Set up DNS (talk about A record, don't link to any particular registrar)

    • Setting up HTTPS

    • Automatic (with Let's encrypt + kube-lego)

    • Manual (specifying CA)

  • Customizing your JupyterHub setup (Customization Guide)

    • User environment (and spawning done by admin)

    • Using an existing image -> dockerstacks right now data8 image

    • Extending your software stack with s2i -> jupyter-repo2docker

    • Pre-populating $HOME directory with notebooks when using Persistent Volumes

    • Persistent storage for each users (size, performance, turning it off) (TODO)

    • Authentication

      • Authenticating with OAuth2

      • Full Example of Google OAuth2

      • Example of using GitHub Auth (TODO)

      • Example of using Google Sheets auth ?! (TODO)

  • Managing the Hub (Administrative Guide for Staging and Production)

    • Resource management (TODO)

    • Culler (timeout + every)

    • Setting memory and CPU guarantees / limits for your users (moved)

    • How to think about Costs (TODO)

    • Growing and shrinking capacity (TODO) - autoscaler

    • Backups

    • Upgrading helm chart (when new versions/configurations come out).

    • Upgrading JupyterHub and Helm Chart (Major releases)

    • Security considerations

    • Troubleshooting

    • Monitoring resource usage of users (TODO)

    • Looking at logs (kubectl logs and kubectl describe) (TODO)

    • GUI management of resources by student admins, TAs, or instructor (Kubernetes documentation reference)

    • looking at logs

    • a couple of common examples

    • Turning it all off -> save from being charged money when not in use

  • Reference

    • Index

    • Glossary

      • JupyterHub
      • Kubernetes
    • References/Links

Out of Scope

Using PostgreSQL and mysql - Use your cloud provider's implementation

Update this doc for v0.4 of helm-chart

O'Reilly is going to publish something about the helm-chart and this guide soon, so we should make sure we update this guide for 0.4. We should also make sure people who are still on 0.3 have a way to access the older code.

Easiest and simplest way would be to tag a 0.3.x version of this repo and make 0.4 changes to master. We can then keep making changes to master to reflect 0.4.

Long term, we should figure out #100 and #101.

For Documentation: auto-cutoff? and consequences?

In the documentation, it would be helpful to have information about whether one can put an automatic limit on how much their cloud account can be billed. Also, if that feature is available, what would the consequences be? What would happen to the system when the shutoff is triggered?

Make it easy for people to access the version of guide they need

When we make releases of the helm-chart repo, it makes documentation here out of date and no longer accurate for the newest version. However, it is accurate for the people who do not upgrade yet.

So we need to have a way to make it clear to people which version of the helm-chart this guide applies to. This is especially important for the parts of the guide about doing administrative things and what not. Upgrading accidentally from 0.3 to 0.4 when all you wanted to do was change the user image can have catastrophic effects (since you could temporarily lose user's home directories).

How do we make this happen?

RTD allows tagging releases, so we could just tag them here synchronously with releases in the helm-chart repo. Moving these docs into the helm-chart repo is also an option (and might make it easier, so we only have one release process).

We should also make it very obvious to users how to find which version they are running so they can use the docs for the appropriate version.

mention sudo for s2i

On ubuntu you need to sudo s2i, otherwise it'll tell you

FATAL: Cannot connect to the Docker daemon. Is the docker daemon running on this host?

which is not a very helpful error message for this

Move this documentation to the `helm chart` repo

@yuvipanda and I were discussing that it might make sense for this documentation to live in the helm-chart repository.

The public-facing links should all stay the same, since z2jh / zero to jupyterhub is much better than "helm chart" as a user-facing name. However, the helm chart is tightly-linked to the guide in this repo, and it would be much simpler if we could just update both the docs and the helm chart in the same PR instead of needing to update two separate repositories. The other benefit would be that we can link z2jh doc releases against the helm chart releases more easily.

Since 0.4 of the helm chart just came out, what if we merged this documentation into the helm-chart repo and updated it for 0.4, then deprecated this repository and directed people to the new location documentation? Maybe @willingc and @minrk have thoughts?

Release v1.0 of this guide

Tracking ticket for all the things we need to do before we can mark this release as v1

  • Add some info on setting up DNS #15
  • Add info about Google, Tmp and Dummy Authenticators
  • Capacity planning guidelines (how to figure out how many nodes you need)
  • Discuss memory and CPU limits, and how to determine what's right for your use case
  • Proper setup and tear down instructions for Google Cloud that are fully 100% tested
  • Instructions about monitoring resource usage (kube dashboard, probably)
  • Test the workflow with several people who aren't sysadmins to see what parts trip them up

What else?

Add support for additional cloud providers

If you're interested in support for this software on AWS, Jetstream, or other cloud providers, please let us know here... or even better, send us a Pull Request with your contributions to getting the code working on your desired cloud provider!

We so far have heard interest in supporting Jetstream using the OpenStack Magnum API, as well as using kubeadm.

We also have heard interest in supporting AWS. Here are some links provided to us by our AWS reps:

https://kubernetes.io/docs/getting-started-guides/aws/
https://aws.amazon.com/quickstart/architecture/heptio-kubernetes/

https entry point?

Please document how to set up an https entry point for the hub server. Currently the server is accessed by default using http.

Be explicit about hardware vs. computational resources

There are a few places where we casually use the word "hardware" when we really mean to say "computational resources", e.g. "you can give your users expanded hardware by ..." .

Since increasing resources to users is technically different from actually using different hardware, we should be more clear on this.

cc @yuvipanda

Helm question

Unclear to me:

Should helm be installed once, or per cluster?

While I can figure it out, it could be made clearer in the docs.
Also you could separate it with "all you need to install on your local machine" on a first section.

Google cloud user / project changing

We should document how to change users / projects within google cloud, as a part of setting up kubernetes with gcloud. Probably many users will have both a gmail and a .edu account. We should add explanations for:

gcloud config set account <ACCOUNT>
and
gcloud config set project <PROJECT-ID>

Importantly the project id must be the project ID, not just the project name.

Extending guide with DNS and letsencrypt

I think a guide for extending this zero to Jhub guide with a DNS server and letsencrypt would be essential.

If the DNS + letsencrypt set-up is made, it makes much more sense to detail different authenticators, since we really want SSL encryption before working with passwords etc.

Maybe include this in the default set-up, since the default currently uses no SSL, which is not recommended.

IFRAME, EMBED: embedding a jupyterhub into an iframe

I'm trying to embed my jupyterhub server (http://jupyterhub.odewahn.com) into an iframe, but am getting this in the browser:

Refused to display 'http://jupyterhub.odewahn.com/hub/login' in a frame because an ancestor violates the following Content Security Policy directive: "frame-ancestors 'self'".

Is there a way to enable this in the config.yaml file?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.