Coder Social home page Coder Social logo

jupyterhub / the-littlest-jupyterhub Goto Github PK

View Code? Open in Web Editor NEW
1.0K 33.0 340.0 9.7 MB

Simple JupyterHub distribution for 1-100 users on a single server

Home Page: https://tljh.jupyter.org

License: BSD 3-Clause "New" or "Revised" License

Python 98.01% Dockerfile 0.57% Smarty 1.42%
jupyterhub education

the-littlest-jupyterhub's Introduction

The Littlest JupyterHub

Documentation build status GitHub Workflow Status - Test Test coverage of code GitHub Discourse Gitter Contribute

The Littlest JupyterHub (TLJH) distribution helps you provide Jupyter Notebooks to 1-100 users on a single server.

The primary audience are people who do not consider themselves 'system administrators' but would like to provide hosted Jupyter Notebooks for their students or users. All users are provided with the same environment, and administrators can easily install libraries into this environment without any specialized knowledge.

See the latest documentation for more information on using The Littlest JupyterHub.

For support questions please search or post to our forum.

See the contributing guide for information on the different ways of contributing to The Littlest JupyterHub.

See this blog post for more information.

Development Status

This project is currently in beta state. Folks have been using installations of TLJH for more than a year now to great success. While we try hard not to, we might still make breaking changes that have no clear upgrade pathway.

Installation

The Littlest JupyterHub (TLJH) can run on any server that is running at least Ubuntu 20.04. Earlier versions of Ubuntu are not supported. We have several tutorials to get you started.

Documentation

Our latest documentation is at: https://the-littlest-jupyterhub.readthedocs.io

We place a high importance on consistency, readability and completeness of documentation. If a feature is not documented, it does not exist. If a behavior is not documented, it is a bug! We try to treat our documentation like we treat our code: we aim to improve it as often as possible.

If something is confusing to you in the documentation, it is a bug. We would be happy if you could file an issue about it - or even better, contribute a documentation fix!

the-littlest-jupyterhub's People

Contributors

betatim avatar budgester avatar carreau avatar choldgraf avatar consideratio avatar dependabot[bot] avatar fm75 avatar fomightez avatar georgianaelena avatar jeanmarcalkazzi avatar jochym avatar jrdnbradford avatar jtpio avatar kafonek avatar laxdog avatar leportella avatar manics avatar minrk avatar nextkaufmann avatar nsurleraux-railnova avatar pnasrat avatar pre-commit-ci[bot] avatar raybellwaves avatar rprimet avatar staeiou avatar stevejpurves avatar trallard avatar willirath avatar wrightaprilm avatar yuvipanda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

the-littlest-jupyterhub's Issues

network timeouts on GCE

When playing with TLJH on a GCE VM, I ran into network timeouts a lot. These caused the bootstrap.py installation method to fail.

Timeouts that caused the curl | sudo python3 one-liner to fail were:

  • the initial curl .../bootstrap.py stalling
  • pip install TLJH from Github stalling
  • traefik download stalling

Depending on the region the VM was in, bootstrapping either never completed or only after several retries. The startup-script method never made it.

Add integration test for HTTPS

While setting up let's encrypt based integration testing is probably too hard, we can use manual TLS for integration testing.

  1. Generate a certificate pair
  2. Set up TLJH to use it
  3. Verify that we have it

Instructions to resume/restart install

I ran the installer but it crashed after the miniconda install as my minimal Ubuntu VM didn't have git installed yet (oooops!).

After installing git I re-ran the installer command but it crashed:

+ chmod +x /tmp/miniconda-installer.sh
+ MD5SUM=a946ea1d0c4a642ddf0c3a26a18bb16d
+ echo 'a946ea1d0c4a642ddf0c3a26a18bb16d  /tmp/miniconda-installer.sh'
+ md5sum --quiet -c -
+ bash /tmp/miniconda-installer.sh -b -p /opt/tljh/hub
ERROR: File or directory already exists: '/opt/tljh/hub'
If you want to update an existing installation, use the -u option.

Deleting that folder fixes the problem but I wonder whether it would be better to check for the presence of the folder and run it with -u instead?

Validate requirements before installing

The installer should check for:

  1. Appropriate Ubuntu server versions
  2. Root access
  3. Python version check
  4. git installed / installable
  5. venv installed / installable

We're currently supporting Ubuntu 18.04+ only. This is primarily because we wanna use system python + venv for the hub environment (rather than conda, for #62). 16.04's python is too old.

Document upgrade strategy

People should be able to upgrade 1 minor version of TLJH installations.

  • Decide on & document our upgrade strategy
    • Users are to run bootstrap.py script from tljh's main branch from where they can specify the version they wish to install/upgrade to.
    • The Hub envs python packages gets upgraded, but the user env is touched as little as possible
  • Document various components & how they would react to upgrades
    • The changelog in #888 mentions this
  • Write upgrade documentation
  • Document what to do if we want a 'wipe everything and start over, but preserve user data' upgrade
    • Still needed as of 2023-08-10, this issue is closed by #932 without addressing this

Related

Write installation / bootstrap logs to somewhere persistent

It'd be helpful if we had language specifically about how to debug the installation process. The biggest "black box" moment of this is the curl script, and in this case users don't know if anything works until everything works (more or less).

The logs section describes how to debug a functioning JupyterHub, but what happens if something goes wrong with the install process? It looks like the python script is printing things to stdout, but that isn't being written to a place that users can access via SSH in case the JupyterHub doesn't pop up as expected. Is that right?

Document using arbitrary authenticators with TLJH

We should offer custom guidance in TLJH for most popular authenticators (Google, LTI, Dummy, LDAP?). However, we don't have to special case any authenticator in TLJH, so we should also document how to use an arbitrary authenticator with TLJH.

Spelling error in installer.py

I was looking over the code to understand the installer and found the following English typo:

print("Grainting passwordless sudo to JupyterHub admins...")

should be

print("Granting passwordless sudo to JupyterHub admins...")

Support additional authenticators

Currently we only support dummy & firstuse authenticator.

I want us to support arbitrary authenticators without having to add custom code for each of them. z2jh requires we do a bunch of extrenous work mapping YAML config to authenticator config for it to work. I want us to not have to do that, somehow.

Trying jupyter hub for the first time

This is not a real issue I just want to express my gratitude for this project.

I had never even ssh'd into a server before today but within 20 minutes I had a jupyter hub instance on the cloud and I could even do this:
image_of_my_phone
I have no practical use to code on my phone but I think it really illustrates how powerful this could be specially in the hands of beginners.
The original Jupyter Hub seemed doable but a little daunting to me. This 'littlest' jupyter hub I spun up in like 3 commands. I am amazed and truly grateful. I think great things will come from this.

I used to work at a middle school where most kids have access to a chromebook which they cannot use to code (at least not easily) but this is incredibly easy to set up. Imagine how many schools could use a tool like this, not only here in the US but around the world.
It cost me nothing to put this together because of how much credit these could services give you when you sign up but even when that runs out it is incredibly cheap to keep the service going.
I grew up in Mexico where most schools don't have the resources to provide each student with access to a computer much less having an IT person around. With this, it feels like any teacher who really wants to could set up a jupyter hub and all their students need is internet.

I am excited to see what comes out of this.

Thank you so much, 👍

Document/move configuration

Currently config is located in $PREFIX/config.yaml, which is where admin users are set, and presumably authenticator will be chosen when that's exposed, etc. This file doesn't seem to appear in the docs yet.

It would probably be good to put all user-editable files (config.yml, mainly) to a different prefix (e.g. /etc/tljh) so that users upgrading/reinstalling can easily trash the whole tljh directory and start over without losing their admin list, etc.

Add support to install libraries during TLJH installation

Currently, to configure the user environment with libraries, we log in as admin and install them one by one. Given that this is likely to be a common procedure, an alternative is to allow for a requirements.txt/environment.yml file to be included in the TLJH installation process.

Ballpark estimates for VM resources in the docs

It'd be great to have suggestions in the docs on the number, size and type of resources for typical use cases. For example, "if you expect 50 users crunching big numbers, you might need at least X CPUs with Y RAM, but if you're only having 5 users learning to code for the first time, you probably only need 1 CPU with this much RAM". I can imagine a lot of users (including myself) don't have enough background to get this right.

Exception at install time (while checking for JupyterHub start)

When installing TLJH I stumbled on this :

Setting up JupyterHub...
twisted 17.5.0 has requirement Automat>=0.3.0, but you'll have automat 0.0.0 which is incompatible.
Created symlink /etc/systemd/system/multi-user.target.wants/jupyterhub.service → /etc/systemd/system/jupyterhub.ser
vice.
Created symlink /etc/systemd/system/multi-user.target.wants/configurable-http-proxy.service → /etc/systemd/system/c
onfigurable-http-proxy.service.
Waiting for JupyterHub to come up (1/4 tries)
Waiting for JupyterHub to come up (2/4 tries)
Traceback (most recent call last):
  File "/opt/tljh/hub/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/tljh/hub/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/tljh/hub/lib/python3.6/site-packages/tljh/installer.py", line 182, in <module>
    main()
  File "/opt/tljh/hub/lib/python3.6/site-packages/tljh/installer.py", line 176, in main
    ensure_jupyterhub_running()
  File "/opt/tljh/hub/lib/python3.6/site-packages/tljh/installer.py", line 145, in ensure_jupyterhub_running
    urlopen('http://127.0.0.1')
  File "/opt/tljh/hub/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/opt/tljh/hub/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/opt/tljh/hub/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/opt/tljh/hub/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/opt/tljh/hub/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/opt/tljh/hub/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

Despite this error, JupyterHub starts just fine. I wonder if catching 404s in the retry loop would help -- this looks like a transient.

Supporting ARM architectures

It would be 💯 to get TLJH running on Raspberry Pis and other small boards. Unfortunately miniconda doesn't support their ARM processor architectures, so some changes to the installer are needed.

As discussed with @yuvipanda a fix for this would be using virtualenv instead of conda. The user environment should be configurable to use either conda or virtualenv. Furthermore nodesource also supports ARM architectures.

Populating user home directories with files

In addition to #49, a second way in which people might like to configure the user environment is by making certain files (datasets, notebooks, etc.) available to all users. Ideally these could be cloned straight from GitHub, with an option to do so during TLJH installation.

Multiple users on one account simultaneously?

Our use case is for introductory astronomy students to run notebooks. I would prefer execution be automatic (via the JupyterHub extension appmode), but will accept it might not be. I currently have created a TLJH user who can log in without a password. In that account, I have made all the notebook files read-only and owned by another user, so that students can’t change the files. This allows me to pass the students a link that automatically executes the notebooks in ‘appmode’ and thus requires rather minimal preparation for the students. I am not considering this ‘security’, just a means of avoiding students accidentally changing the files and causing problems for other students in the shared account.

In this scenario, can more than one person be signed in on the same account simultaneously without ill consequences? NOTE: None of the notebooks interact with the server to create files (other than pycache files and the like that are automatically created).

Google cloud instructions resulted in a non-accessible JupyterHub

The gcloud instructions gave me a JupyterHub that wasn't accessible at the external IP:

Here's the link (I created it in binder sandbox so we could share instance info) https://console.cloud.google.com/compute/instancesDetail/zones/us-west1-b/instances/test-jupyterhub?project=binder-sandbox-194621

It seems like JupyterHub is running (if I SSH into it and run sudo systemctl restart jupyterhub I see JupyterHub is now running at http://:80/) but hitting the external IP at https://35.185.197.86/ isn't responding.

Install does NOT create YAML config file

The installation instructions (specifically the "Administrative Access" page) states:

Admin users are specified in the YAML config file at /opt/tljh/config.yaml. This file is created upon installing tljh.

This is not true for me, at least on the installations I have done. It is easy enough to manually create the file from scratch, but it should be clearly stated.

Setup OOMScoreAdj for hub & user services

We want user processes to be killed more frequently when system runs out of RAM, not system processes or hubs. This should normally 'just work', but as additional precatuions we should default to:

  1. 90% default memoryu limit on all spawned user servers
  2. Positive OOMAdj score on user services

Currently the default limit is 1G, and that should go.

Clearer instructions on admin access for IT departments

The Administrative Access page of the docs ends with

This should give you admin access from JupyterHub! You can verify this by:

  1. Opening a Terminal in your JupyterHub and checking if sudo works
  2. Opening your JupyterHub Control Panel and checking for the Admin tab

From now on, you can use the JupyterHub to do most configuration changes.

The problem is when my IT department looked at this, they had no clue how to access "JupyterHub". It would be clearer to explain "You should now be ready to access the JupyterHub sever. Open a web browser to http://hostname where hostname is the name of the server you are configuring) and you should see the JupyterHub login page! You can access it using the same credentials as your login account." I would also suggest maybe a screenshot of what to expect and HOW to open up the terminal in JupyterHub or how to open the control panel.

Use more emojis

Everything needs more emojis, especially documentation.

I learnt everything I know about this from @rgbkrk and nteract.

Provide a commandline tljh command for setting config

Editing YAML files on the terminal is not the easiest thing for new people. Indents get screwed up often, and lots of people are not used to nano / vim.

We should offer an easy way to interact with this YAML file from the commandline. Inspiration for this comes from the git config command, which lets you edit your .gitconfig files easily, without having to
manually edit the files. This also helps with validation a lot.

Instead of asking users to add:

user_environment:
   default_app: jupyterlab

to the yaml file, they can instead run:

tljh config set user_environment.default_app jupyterlab

This command would:

  1. Load the current config.yaml file
  2. Set the requested key to requested value
  3. Validate the new config
  4. Save config
  5. Reload JupyterHub to let config take effect.

For lists, you can do:

tljh config add users.admin yuvipanda

or

tljh config set-list users.admin yuvipanda test1 test2

(using set-list instead of set so people don't accidentally clean out their lists and replace them with one item)

View items with:

tljh config view users.admin

You can reset items with:

tljh config reset users.admin

TLJH login failures

I managed to get logged in to my TLJH server via the web browser. Played around with it and installed some packages via the terminal. However once I logged out it will not let me log back in again. JupyterHub is insisting I am entering the wrong username and password when the same username and password work via SSH.

Syslog states

Jul 9 19:32:47 jupyterhub python3[859]: [W 2018-07-09 19:32:47.993 JupyterHub base:503] Failed login for juan

Attempted solutions

  • Checked /opt/tljh/config.yaml for problems, appears intact with contents
users:
  admin:
    - juan
  • Restarting server software didn't help.
  • Restarting the VM entirely didn't help.
  • Changing user password from the command line and trying the new password didn't help.
  • Changed web browsers, it didn't help.

So I have the login page, but it is not recognizing the account. This suggests a serious potential problem with how JupyterHub is authenticating, since it appears to have broken.

Feedback from testing https://twitter.com/Juan_Kinda_Guy

Yuvi,

I had a few minutes (OK an hour) this morning, so I decided to take a crack at installing TLJH. I’m copying Matt Craig and our IT folks on this so they have this as well. I’ll be gone next week, so this is my contribution to the effort for now.

I’ve downloaded the ubuntu-18.04-live-server ISO and installed a VM of it under Parallels on my Mac. I had some issues with the installation instructions.

Step 1 went fine and the installation script seemed to run without problems.

For "Step 2: Add admin user”, the instructions say to ‘open’ the /opt/tljh/config.yaml file. The file DOES not exist to start, the language was unclear. Should there have been a pre-existing YAML file or not?

Also, after trying to start the jupyter server using 'sudo systemctl restart jupyterhub’, there is no feedback (I know when I start a web server, normally I get a one line feedback that the server started. I don’t know if the server actually started or not. I suspect it didn’t start, but the instructions don’t provide a way to check if the server is running other than connecting to it, but it doesn’t specify the port to use to connect. Is there a default port? Sorry if this should be obvious…

Step 3 presented a bunch of issues. conda doesn’t appear to be in the default path for my admin user. Apparently my admin's bash profile was not modified during the install. Since the binaries were installed in /opt/tljh/user/bin, I modified my .profile to add that to my path and ran the conda configuration script by added this to the end of my .profile file

if [ -d "/opt/tljh/user/bin” ] ; then
    PATH="/opt/tljh/user/bin:$PATH
    . /opt/tljh/user/etc/profile.d/conda.sh
fi

However, despite the sudo -E (which is supposed to preserve the environment), it reported ‘command not found’. Looks like the directory conda is installed in is not considered safe for sudo (it is NOT listed in the secure_path variable in the default /etc/sudoers file). So I did a ‘sudo visudo’ to edit the /etc/sudoers file and added /opt/tljh/user/bin to the secure_path variable. (Actually, I see you addressed this sudo permission problem on the ‘Customizing User Environment’ page, by using the workaround

sudo PATH=${PATH} conda install -c conda-forge gdal

However, I figured I would bring it up, since I was just following the instructions when this happened).

At that point the following worked (I had a hiccup with a network disconnection, but rerunning the conda install command worked fine):

sudo -E conda install -c conda-forge gdal

However, when I did:

sudo -E pip install numpy

it said I had already installed it. I confirmed this by opening up an ipython shell and importing numpy

I still don’t know if I managed to get the installation working since I didn’t know how to access the Jupyter notebook server, but I hope this helps clarify possible installations issues from the point of view of a user.

Juan

update documentation on disk size

Step 13 says:

In Google Cloud, the higher your disk size the faster your disk is. The default (10GB) is pretty low, so you might want to increase it.

Please add guidance or suggestions on how much to increase it based on # of students or hours.

Adding capabilities to TLJH to set up authenticated, automatically set up, accounts

As an alternative to the approach I outlined in Issue #46 (multiple users sharing one account), which frankly, seems dodgy, I am looking at getting TLJH running with HTTPS, LDAP Authentication, and automatic Git retrieval of the code for new users. So, a few questions I have:

  1. Would you recommend creating a new Ubuntu 18.04 VM as a starting point for performing an upgrade to newer versions of TLJH? Not a big deal, but it seems with the many structural changes I am seeing to how the configuration files will be set up in the near future this makes sense.

  2. I assume TLJH replaces Apache as a server, so taking the Apache approach to creating security certificates will not work for HTTPS. We don't want to run HTTPS through a proxy if we can avoid it (which I think excludes Let's Encrypt via Træfik). Is there a way to get HTTPS up so our authentication isn't compromised?

  3. To install an LDAP authenticator, can I just use the conda installation of https://github.com/jupyterhub/ldapauthenticator with TLJH or is it different enough from JupyterHub to require a different approach.

  4. A similar question about nbgitpuller, can I just follow the installation instructions for it on JupyterHub? Is it typically to set it up so they follow the link, it creates the account (after LDAP authentication), and then does the automatic git pull?

Serve a temporary "TLJH is building" page while TLJH is building

Would it be straightforward for TLJH to first set up a really short-and-sweet "please be patient while your Littlest JupyterHub is built" page? This would help users disambiguate "I just need to wait a bit longer for my jupyterhub to set up" and "something went wrong with the startup script and I need to debug"

Support HTTPS with let's encrypt

HTTPS is important, and we should do it right!

We could start by putting https://traefik.io/ in front of CHP for HTTPS termination. Eventually, we could either just use Traefik for the proxy, or add Let's Encrypt functionality to CHP.

Explain 'sudo -E' someplace accessible

Issue triage update by Erik 2021-10-22: this remains relevant.

I suggest a topic guide entry in our documentation about the choices we've made to add exempt_group in sudoers to preserve PATH for those users, and how the -E flag is relevant.


It's important, but need to figure out where to explain it without complicating things

Write stdout / stderr from all subprocess commands to log

In bootstrap / installer, we call a lot of external commands. However, we do not log their output to the logs - this means we lose very important information there. We should be logging all that too - but not outputting it to stdout.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.