dandi / dandi-hub Goto Github PK

Infrastructure and code for the dandihub

License: Other

Dockerfile 7.92% Jinja 52.61% MATLAB 39.47%

dandi-hub's Introduction

DANDI JupyterHub

This repository spins up a Kubernetes cluster on AWS using ansible, boto and kops, and then uses the jupyterhub helm chart to deploy a JupyterHub instance in the cluster.

This project is deployed at Dandihub which a JupyterHub instance in the cloud that allows users to interact with the data stored in DANDI.

Usage

To use the hub, you will need to register for an account using the DANDI Web application (https://dandiarchive.org) using your GitHub account.

NOTE: Note that Dandihub is not intended for significant computation, but provides a place to introspect Dandisets and to perform some analysis and visualization of data.

Deployment

This information in this README is based on:

Note: The original MAST setup is now significantly outdated.

Follow the steps below to deploy DANDI JupyterHub.

Note: Be sure to perform all the operations in the same AWS zone that you will use in group_vars/all file. (US-east-2 Ohio)

Create an https certificate for your domain using AWS cert manager. It's free to attach this certificate to load balancers, and JupyterHub also allows proxy offloading to this certificate.
Create the GitHub OAuth App id/token: GitHub settings -> Developer settings -> Oauth Apps. We have done this via a bot GitHub user account (e.g. dandibot). You will need to set Homepage URL (e.g., https://hub.dandiarchive.org) and the Authorization callback URL (e.g. https://hub.dandiarchive.org/hub/oauth_callback). This can be set to a subdomain, just be sure to set this to the same value as ingress in group_vars/all and also set up the CNAME route via Route 53.
Set up an AWS CI instance with these authorized roles (see this blog post for more details):
- AmazonEC2FullAccess
- AmazonSQSFullAccess
- IAMFullAccess
- AmazonS3FullAccess
- AmazonVPCFullAccess
- AmazonElasticFileSystemFullAccess
- AmazonRoute53FullAccess
- AmazonEventBridgeFullAccess
Add the public dns name to the hosts file (This is an Ansible Inventory file.)
SSH into the ec2 instance (using the pem key downloaded in previous step) and install git in the CI instance sudo yum install git -y
Update the variables in group_vars/all
Install ansible locally and create a password for ansible to encrypt some of the ansible variables:

openssl rand -hex 32 > ansible_password
Encrypt strings using ansible-vault

ansible-vault encrypt_string --vault-password-file ansible_password

This will prompt for input.
- Paste the string to encrypt without a carriage return
- Hit Ctrl-d twice
- Copy the encrypted string into the relevant section of group_vars/all
- NOTE: Use bash rather than a non-standard shell to prevent truncation.

Required vault values:
   - github_client_id (From GH OAuth app)
   - github_client_secret (From GH OAuth app)
   - aws_certificate_arn (From aws certificate manager)
   - dummypass (a string password you can use for testing without GitHub authentication
         by uncommenting the relevant dummypass options in `config.yaml.j2`)
   - danditoken (used to authenticate github users against registered dandi users)

1. Also note that `namespace` has to be unique across any JH
   instances created with this setup.

   1. Ensure `z2jh.yaml` uses the `ig-policy` in the file. (This
      is not necessary to change if there is already an instance of the
      policy in AWS. If you need to create `ig-policy` use the following:

           ```
           {
               "Version": "2012-10-17",
               "Statement": [
                   {
                       "Effect": "Allow",
                       "Action": [
                           "autoscaling:DescribeAutoScalingGroups",
                           "autoscaling:DescribeAutoScalingInstances",
                           "autoscaling:DescribeLaunchConfigurations",
                           "autoscaling:DescribeScalingActivities",
                           "autoscaling:DescribeTags",
                           "ec2:DescribeInstanceTypes",
                           "ec2:DescribeLaunchTemplateVersions"
                       ],
                       "Resource": ["*"]
                   },
                   {
                       "Effect": "Allow",
                       "Action": [
                           "autoscaling:SetDesiredCapacity",
                           "autoscaling:TerminateInstanceInAutoScalingGroup",
                           "ec2:DescribeImages",
                           "ec2:GetInstanceTypesFromInstanceRequirements",
                           "eks:DescribeNodegroup"
                       ],
                       "Resource": ["*"]
                   }
               ]
           }
           ```

Run the playbook!

ansible-playbook -i hosts z2jh.yml -v --vault-password-file ansible_password

To tear down:

ansible-playbook -i hosts teardown.yml -v --vault-password-file ansible_password -t all-fixtures

To remove kubernetes without removing shared EFS:

ansible-playbook -i hosts teardown.yml -v --vault-password-file ansible_password -t kubernetes

Pushing Changes to GitHub

Inside z2jh-aws-ansible, do rm -rf * and then git stash. This will restore the submodule to its pre-modification step.
Step outside, commit changes, and either push or send a PR to Dandihub.

Files

group_vars/all: ansible file contains variables for various templates
cluster-autoscaler-multi-asg.yaml.j2: k8s cluster autoscaler spec
config.yaml.j2: z2jh jupyterhub configuration
hosts: ansible provides IP of control host
nodes[1-3].yaml.j2: k8s node specs for on-demand nodes in multiple zones
pod.yaml.j2: k8s pod for introspecting shared storage
pv_efs.yaml.j2: k8s persistent volume spec for EFS
pvc_efs.yaml.j2: k8s persistent volume claim for EFS
spot-ig.yaml.j2: k8s non-GPU spec for compute nodes
spot-ig-gpu.yaml.j2: k8s GPU spec for compute nodes
storageclass.yaml.j2: k8s EFS storageclass
teardown.yml: ansible file for tearing down the cluster
z2jh.yml: ansible file for starting up the cluster

Resources

To learn how to interact with the DANDI archive and for examples on how to use the DANDI Client in various use cases, see the handbook.
To get help:
- ask a question: https://github.com/dandi/helpdesk/discussions
- file a feature request or bug report: https://github.com/dandi/helpdesk/issues/new/choose
- contact the DANDI team: [email protected]

dandi-hub's People

Stargazers

Watchers

Forkers

satra yarikoptic thewtex abcd-repronim djarecka portega-inbrain metacell asmacdo atlaskb sooyounga codycbakerphd catalystneuro neurovium peter-okari kabilar lincbrain brain-bican

dandi-hub's Issues

Optimizing Parallelization

Hey all,

As I'm running a big task on the Hub I'm trying to get the most out of available resources on a extra large spawn.

However, both my own scripts and dandi upload trigger the above display, and no matter how many jobs I try to parallelize over, the actual I/O all seems capped at 8 concurrent processes according to top (the other tasks will show up as queued in that they have PIDs, but zero activity).

Is this intended?

drag/drop files from dandiarchive assets browser INTO notebooks/widget

The idea came while thinking about that iframe issue dandi/dandiarchive-legacy#745 : if there is a way to have in notebook (or its widget) to react on "drop" event -- it would be very handy to be able to have web-ui navigation pane in the left and then be able to just drag/drop elements into a widget or notebook which would react by e.g. updating or creating a new widget to browse.

ping @bendichter @waxlamp since might be of interest and might not be monitoring this repo.

"unminimize" since we do have people login into the system?

failed to get basic man to work:

jovyan@jupyter-yarikoptic:~/fscacher/src/fscacher$ man nl
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, including manpages, you can run the 'unminimize'
command. You will still need to ensure the 'man-db' package is installed.
jovyan@jupyter-yarikoptic:~/fscacher/src/fscacher$ nl --elp
nl: unrecognized option '--elp'
Try 'nl --help' for more information.
jovyan@jupyter-yarikoptic:~/fscacher/src/fscacher$ unminimize
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

This script restores content and packages that are found on a default
Ubuntu server system in order to make this system more suitable for
interactive use.

Reinstallation of packages may fail due to changes to the system
configuration, the presence of third-party packages, or for other
reasons.

This operation may take some time.

Would you like to continue? [y/N] y

Re-enabling installation of all documentation in dpkg...
mv: cannot move '/etc/dpkg/dpkg.cfg.d/excludes' to '/etc/dpkg/dpkg.cfg.d/excludes.dpkg-tmp': Permission denied

feature: install MatNWB for MATLAB instances

In general it's hard to know what packages MATLAB users are going to want- there aren't any packages that are as ubiquitous as numpy and scipy are for Python. But it seems to me that someone opening an instance on DANDI Hub will probably want MatNWB installed. Otherwise the first few steps of anyone using MATLAB will be to do this.

What's the best way to set this up? One options is we could make a startup.m script and within that we could git clone MatNWB and add it to the path. The advantage of doing it that way would be you always have the latest version.

FOI: docker image fails to build - Yarik is cursed by all the nodes

I am cursed... for the 3rd day I am dealing with nodejs/npm gotchas across 3 projects I had touched and which apparently use nodejs/npm (openneuro client, sparkle, and now this). Such luck never happened before -- I am afraid nodejs is winning the world.

To follow up on #14 with additional tuneups (removal of apt listings, and may be avoiding some layering) decided to build the beast locally and failed with

(git)smaug:~/proj/dandi/dandihub[dandi]docker
$> git describe                     
fatal: No names found, cannot describe anything.

$> git describe --always
fdd8e47

$> docker build -t dandihub:orig-1 .
...
Removing intermediate container 9020b91bbd8c
Step 16/18 : RUN jupyter labextension install @jupyter-widgets/jupyterlab-manager   jupyter-matplotlib jupyterlab-datawidgets [email protected] jupyterlab-plotly   plotlywidget jupyter-threejs --no-build   && expo
rt NODE_OPTIONS=--max-old-space-size=4096   && jupyter lab build &&      jupyter lab clean &&      jlpm cache clean &&      npm cache clean --force &&      rm -rf $HOME/.node-gyp &&      rm -rf $HOME/.local && rm 
-rf /tmp/*
 ---> Running in ec6e20b5d27d
[LabBuildApp] JupyterLab 3.0.14
[LabBuildApp] Building in /opt/conda/share/jupyter/lab
[LabBuildApp] Building jupyterlab assets (production, minimized)
Build failed.
Troubleshooting: If the build failed due to an out-of-memory error, you
may be able to fix it by disabling the `dev_build` and/or `minimize` options.

If you are building via the `jupyter lab build` command, you can disable
these options like so:

jupyter lab build --dev-build=False --minimize=False

You can also disable these options for all JupyterLab builds by adding these
lines to a Jupyter config file named `jupyter_config.py`:

c.LabBuildApp.minimize = False
c.LabBuildApp.dev_build = False

If you don't already have a `jupyter_config.py` file, you can create one by
adding a blank file of that name to any of the Jupyter config directories.
The config directories can be listed by running:

jupyter --paths

Explanation:

- `dev-build`: This option controls whether a `dev` or a more streamlined
`production` build is used. This option will default to `False` (i.e., the
`production` build) for most users. However, if you have any labextensions
installed from local files, this option will instead default to `True`.
Explicitly setting `dev-build` to `False` will ensure that the `production`
build is used in all circumstances.

- `minimize`: This option controls whether your JS bundle is minified
during the Webpack build, which helps to improve JupyterLab's overall
performance. However, the minifier plugin used by Webpack is very memory
intensive, so turning it off may help the build finish successfully in
low-memory environments.

An error occured.
RuntimeError: npm dependencies failed to install
See the log file for details:  /tmp/jupyterlab-debug-xeq6vch7.log
The command '/bin/bash -o pipefail -c jupyter labextension install @jupyter-widgets/jupyterlab-manager   jupyter-matplotlib jupyterlab-datawidgets [email protected] jupyterlab-plotly   plotlywidget jupyter-threejs
 --no-build   && export NODE_OPTIONS=--max-old-space-size=4096   && jupyter lab build &&      jupyter lab clean &&      jlpm cache clean &&      npm cache clean --force &&      rm -rf $HOME/.node-gyp &&      rm -r
f $HOME/.local && rm -rf /tmp/*' returned a non-zero code: 1

Slow ROS3 streaming for files with lots of timeseries objects

I noticed that DANDI operations tend to become very slow for files containing hundreds of time series objects, even if those files are small in size (~tens of megabytes).

A good example of files like that: https://dandiarchive.org/dandiset/000239

I noticed this in two occasions:

pre-validating when performing dandi upload
ros3 streaming to notebooks running either in dandihub or locally

the first point is not so problematic, but the second one is very limiting, taking several minutes just to read the file.

To reproduce the Issue:

from dandi.dandiapi import DandiAPIClient
from pynwb import NWBHDF5IO
from nwbwidgets import nwb2widget

# 1 - This is a small file in size, but with many timeseries objects
dandiset_id = "000239" 
filepath = "sub-MX180701/sub-MX180701_ses-20180916T121311_behavior.nwb"

# 2 - This is a much larger file in size, but with few timeseries objects
# dandiset_id = "000233"
# filepath = "sub-CGM1/sub-CGM1_ses-CGM1-0um-181130-112307_ecephys.nwb"

with DandiAPIClient() as client:
    asset = client.get_dandiset(dandiset_id, "draft").get_asset_by_path(filepath)
    s3_url = asset.get_content_url(follow_redirects=1, strip_query=True)

io = NWBHDF5IO(s3_url, mode='r', load_namespaces=True, driver='ros3')
nwb = io.read()

print("file read... rendering nwbwidgets...")

nwb2widget(nwb)

The excessively long time happens at nwb = io.read() for file 1, but is very quick for file 2.
From that point on, using any operation such as exploring it with widgets, works well for both cases.

automate docker building using actions

docker stopped support for automated building unless the organization upgrades to teams. we can build using an action and push to the hub.

[Bug]: No browser in remote desktop

Trying to use the internet browser through the remote desktop errors out as below

video2907658798.mp4

Running latest NWB Widgets release on ipykernal

This issue is currently resolvable through manual installation of very particular versions of secondary dependencies

It does not currently affect the default import and use of the NWB Widgets in the ipykernel environment

The issues can also likely be resolved entirely within https://github.com/neurodatawithoutborders/nwbwidgets together with a fresh release to fix the issue when using the 'latest' release on the DANDI Hub

I'm posting and leaving this here in case anyone experiences a similar problem in the future

Problem

On the ipykernel, the default version of NWB Widgets that is available on starting up a server on the DANDI Hub is 0.9.1. It is accompanied by ipywidgets==8.0.4.

If you upgrade this to the latest (pip install -U nwbwidgets==0.10.0, which replaces ipywidgets==7.7.2) and then try to use either Panel (new feature of 0.10.0) or the classic nwb2widget, you see this pop up

and expanding it gives the traceback

[Open Browser Console for more detailed log - Double click to close this message]
Failed to load model class 'VBoxModel' from module '@jupyter-widgets/controls'
Error: Module @jupyter-widgets/controls, version ^1.5.0 is not registered, however,         2.0.0 is
    at f.loadClass (https://hub.dandiarchive.org/user/codycbakerphd/lab/extensions/@jupyter-widgets/jupyterlab-manager/static/134.40eaa5b8e976096d50b2.js?v=40eaa5b8e976096d50b2:1:74977)
    at f.loadModelClass (https://hub.dandiarchive.org/user/codycbakerphd/lab/extensions/@jupyter-widgets/jupyterlab-manager/static/150.b0e841b75317744a7595.js?v=b0e841b75317744a7595:1:10729)
    at f._make_model (https://hub.dandiarchive.org/user/codycbakerphd/lab/extensions/@jupyter-widgets/jupyterlab-manager/static/150.b0e841b75317744a7595.js?v=b0e841b75317744a7595:1:7517)
    at f.new_model (https://hub.dandiarchive.org/user/codycbakerphd/lab/extensions/@jupyter-widgets/jupyterlab-manager/static/150.b0e841b75317744a7595.js?v=b0e841b75317744a7595:1:5137)
    at f.handle_comm_open (https://hub.dandiarchive.org/user/codycbakerphd/lab/extensions/@jupyter-widgets/jupyterlab-manager/static/150.b0e841b75317744a7595.js?v=b0e841b75317744a7595:1:3894)
    at _handleCommOpen (https://hub.dandiarchive.org/user/codycbakerphd/lab/extensions/@jupyter-widgets/jupyterlab-manager/static/134.40eaa5b8e976096d50b2.js?v=40eaa5b8e976096d50b2:1:73393)
    at b._handleCommOpen (https://hub.dandiarchive.org/user/codycbakerphd/static/lab/jlab_core.4566032f8b9a1bbebc97.js?v=4566032f8b9a1bbebc97:2:1001335)
    at async b._handleMessage (https://hub.dandiarchive.org/user/codycbakerphd/static/lab/jlab_core.4566032f8b9a1bbebc97.js?v=4566032f8b9a1bbebc97:2:1003325)

Solution

If you force an upgrade to pip install ipywidgets==8.0.4 and restart the kernel, it resolves the issue

can't open directory: /opt/conda/lib/hdf5/plugin

Hi all, I'm trying to use the h5py ros3 driver in the python stack on hub.dandiarchive.org with the following code:

import h5py
url="https://dandiarchive.s3.amazonaws.com/blobs/9ce/1d4/9ce1d405-323f-4f6d-a5a8-63fd59c3dfe0"
with h5py.File(url, "r", driver="ros3") as fd:
    ds = fd["0"]
    print("Dataset shape = %s", ds.shape)
    img = ds[0, 0, 1024:1025,1109:1609,19406:19906]

and I get the following error stack:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
/tmp/ipykernel_375/2585987517.py in <module>
      4     ds = fd["0"]
      5     print("Dataset shape = %s", ds.shape)
----> 6     img = ds[0, 0, 1024:1025,1109:1609,19406:19906]

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

/opt/conda/lib/python3.9/site-packages/h5py/_hl/dataset.py in __getitem__(self, args, new_dtype)
    788         mspace = h5s.create_simple(selection.mshape)
    789         fspace = selection.id
--> 790         self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
    791 
    792         # Patch up the output for NumPy

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5d.pyx in h5py.h5d.DatasetID.read()

h5py/_proxy.pyx in h5py._proxy.dset_rw()

OSError: Can't read data (can't open directory: /opt/conda/lib/hdf5/plugin)

It looks like I may not have read permissions for the HDF5 plugin?

Thanks,
Lee
https://gist.github.com/LeeKamentsky/e6585bcc5695bee0328f88e68ca1ddfb

Move out more of jupyterhub config.yaml into configurable items

admin_users (now me satra et al): https://github.com/dandi/dandi-hub/blob/dandi/config.yaml.j2#LL110C7-L110C18
...

Consider a filesystem with CoW (e.g. BTRFS) wherever possible

With git-annex/DataLad in the picture, it might be beneficial to have local file system for any deployment (e.g. of girder server) to have a file system with CoW support (such as BTRFS), so large data arrays could efficiently be copied/curated without possibly incurring additional storage penalty.
I have been using BTRFS for over 5 years by now. Some culprits do come about from time to time, but they all seems to be manageable.

upgrade jupyterhub helm chart to 0.10.0

new release of jupyterhub helm chart.

[FEATURE] Nwbwidgets dashboard service for visualization of files in DANDI archive

Hi,
we’ve been recently working on improving the ways people could use nwbwidgets and, among other options, we’ve come up with a containerized service (using Jupyter + Voila + nwbwidgets stack) that renders the nwbwidgets view of any NWB file as a Dashboard, that is, without access to the code cells. We believe this has the potential of becoming a powerful data exploration feature in DANDI archive.

It works better than the current NWBExplorer service, which has a file size limit and is very slow. This improvement comes from using fsspec to stream data from DANDI.
Although users can already run nwbwidgets to visualize any DANDI set from within DANDI hub, this would be a much easier option for visually exploring any specific files and would not expose the computing resources to users running custom code in the notebook cells
The computing resources necessary to run a Voila process with such a widgets panel are actually not much, because we don’t load the entire file to memory and nwbwidgets doesn’t perform complex data operations. Probably the Tiny server would do a good enough job already
I’m not 100% sure of this, but I suspect the performance might get even better if the DANDI archive storage is mounted as a file system to the running container

With this container service (see here) we can directly create a single NWB file dashboard with the typical result of running nwbwidgets (see gif below). This would be useful if we had a clickable link next to each NWB file in DANDI archive (like the one illustrated below) that would make the service run for that specific file.

Illustration of the link for triggering the service for an individual file:

This link would then spin up a temporary container serving the widgets rendering the data for that specific file. This can be done, for example, by running:

$ docker run -p 8866:8866 -e S3_URL_NWBFILE=https://dandiarchive.s3.amazonaws.com/blobs/d78/740/d7874048-7192-4f64-92cb-24d7ff67f530 nwbwidgets-voila

and open a view like this:

The dockerfile that builds the image for the service can be found here (it will soon be merged) and it should be good to be tested.

We would like to hear your impressions, if you see potential for this becoming integrated in DANDI and, if so, what we could do to make this happen.
fyi @bendichter

satra/dandihub:e3b52ec0 image too big?

trying to start a tiny issue on hub. http://hub.dandiarchive.org/hub/spawn-pending/yarikoptic is either taking too long (minutes)

NB it did finish eventually

docker pulling on a hefty/speedy smaug shows that some layers are huge (>=600MB)... and actually I even ran out of space
(on smaug)

failed to register layer: ApplyLayer exit status 1 stdout:  stderr: write /opt/conda/envs/ros3/lib/python3.9/ensurepip/_bundled/setuptools-49.2.1-py3-none-any.whl: no space left on device

not entirely sure why yet since

$> df /opt -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/md0        147G  125G   15G  90% /

https://hub.docker.com/r/satra/dandihub/tags?page=1&ordering=last_updated says it is 3.6GB .

I wonder if it couldn't be minimized -- sounds like too big FWIW

Package request

Dear Satra and dandihub team,
I have try the dandi hub platform and tried to make proof of concept of spike sorting there.
Thanks for this, this is very impressive and powerfull.
The vnc is really a good idea also.

About the VNC I would have a linux package request.

I have some tools to explore data that still are PyQt based. I should port then to jupyter widgets but having then working in VNC would extremly convinient.

Unfortunatly this do not work because Qt can cannot find backend on the xfce platform.
I have this error

qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, webgl, xcb.

In my lab we have similar ressource (jupyter + vnc).
I experience the same issue (PyQt5 not working on vnc).
With the sys admin we installed some more packages (centos based): dnf install xcb-util-wm xcb-util-image xcb-util-keysyms xcb-util-renderutil
This fixed the issue: all the viewer Qt based are now working on vnc in my lab.

So my question : would it be possible to have theses (small) linux package also on danihub ?
In spike sorting field many viewer tools are still Qt based (phy, spikeinterface-gui, ...) and it would be very conviient to have a look fo the output of spike sorting tools.

In the same feeling, if installing more package is not too complicated I would love to have the htop to monitor the ressource whivh is more convinient than the classical top. But the second request is more bonus.

Thanks a lot

MATLAB not working

I go on hub.dandiarchive.org and click on the MATLAB option. I see the MATLAB icon on the launcher, but when I click on it I get

README dead link

Dead link near the top of the readme: this blog post on autoscaling and spot pricing

GPU instances fail to build singularity containers

When I try to build a singularity container on a GPU instance, it fails:

$ singularity build deeplabcut.sif deeplabcut.def 
FATAL:   failed to mount proc filesystem: operation not permitted

The same instruction runs fine in the CPU instances. I tried cleaning the singularity cache and using apptainer, also different build options, but the error remains.

plotly-based nwb widgets not working

NWB widgets relies pretty heavily on plotly. Currently, all the figures that rely on plotly are not showing up:

This same code used to work as expected months ago but at some point broke on the hub side. The odd thing is that stand-alone plotly graphs do work:

import plotly.express as px

fig = px.line(x=["a","b","c"], y=[1,3,2], title="sample figure")
print(fig)
fig.show()

This thread suggests it might be an issue with strict package version requirements

Use existing ansible role for kops?

With @asmacdo we looked at https://github.com/Flaconi/ansible-role-kops/ which seems to be quite thorough and might help to 1. modularize kops definition, 2. avoid manual logic for installing etc which we have in our spagetti ansible file (partially modularized by WiP #76 )

[Bug]: Cannot spawn 'Extra Large'

From the main DANDIHub page, I try to spawn an 'Extra Large' instance to run some heavy duty file analysis.

It then stalls out for about 10 minutes on this page

but eventually leads to this page, where it stalls another 10 minutes or so

and ultimately errors out with

redo matlab container image

the following profile allows running matlab in the hub, but the container does not provide other goodies (git, datalad, dandi, etc.,.). the container should be extended to include some of the dandi tools.

    - display_name: "Base (MATLAB)"
      description: "6 CPU / 16 GB upto 12C/32G. May take up to 15 mins to start. This requires your own license."
      kubespawner_override:
        singleuser_image_spec: 'ghcr.io/mathworks-ref-arch/matlab-integration-for-jupyter/jupyter-byoi-matlab-notebook:r2022a'
        image_pull_policy: Always
        cpu_limit: 12
        cpu_guarantee: 6
        mem_limit: 32G
        mem_guarantee: 16G

testhub doesn't have ros3 enabled

s3_path = 'https://dandiarchive.s3.amazonaws.com/blobs/33f/72b/33f72ba7-5ad8-4e42-b52b-e3a6b309aefd'
io = NWBHDF5IO(s3_path, mode='r', load_namespaces=True, driver='ros3')

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.9/site-packages/h5py/_hl/files.py in make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0, **kwds)
    134     try:
--> 135         set_fapl = _drivers[driver]
    136     except KeyError:

KeyError: 'ros3'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_476/2308709919.py in <module>
      1 s3_path = 'https://dandiarchive.s3.amazonaws.com/blobs/33f/72b/33f72ba7-5ad8-4e42-b52b-e3a6b309aefd'
----> 2 io = NWBHDF5IO(s3_path, mode='r', load_namespaces=True, driver='ros3')

/opt/conda/lib/python3.9/site-packages/hdmf/utils.py in func_call(*args, **kwargs)
    581             def func_call(*args, **kwargs):
    582                 pargs = _check_args(args, kwargs)
--> 583                 return func(args[0], **pargs)
    584         else:
    585             def func_call(*args, **kwargs):

/opt/conda/lib/python3.9/site-packages/pynwb/__init__.py in __init__(self, **kwargs)
    228 
    229             tm = get_type_map()
--> 230             super(NWBHDF5IO, self).load_namespaces(tm, path, file=file_obj, driver=driver)
    231             manager = BuildManager(tm)
    232 

/opt/conda/lib/python3.9/site-packages/hdmf/utils.py in func_call(*args, **kwargs)
    581             def func_call(*args, **kwargs):
    582                 pargs = _check_args(args, kwargs)
--> 583                 return func(args[0], **pargs)
    584         else:
    585             def func_call(*args, **kwargs):

/opt/conda/lib/python3.9/site-packages/hdmf/backends/hdf5/h5tools.py in load_namespaces(cls, **kwargs)
    142             'namespace_catalog', 'path', 'namespaces', 'file', 'driver', kwargs)
    143 
--> 144         open_file_obj = cls.__resolve_file_obj(path, file_obj, driver)
    145         if file_obj is None:  # need to close the file object that we just opened
    146             with open_file_obj:

/opt/conda/lib/python3.9/site-packages/hdmf/backends/hdf5/h5tools.py in __resolve_file_obj(path, file_obj, driver)
    117             if driver is not None:
    118                 file_kwargs.update(driver=driver)
--> 119             file_obj = File(path, 'r', **file_kwargs)
    120         return file_obj
    121 

/opt/conda/lib/python3.9/site-packages/h5py/_hl/files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, fs_strategy, fs_persist, fs_threshold, **kwds)
    421 
    422             with phil:
--> 423                 fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0, **kwds)
    424                 fid = make_fid(name, mode, userblock_size,
    425                                fapl, fcpl=make_fcpl(track_order=track_order, fs_strategy=fs_strategy,

/opt/conda/lib/python3.9/site-packages/h5py/_hl/files.py in make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0, **kwds)
    135         set_fapl = _drivers[driver]
    136     except KeyError:
--> 137         raise ValueError('Unknown driver type "%s"' % driver)
    138     else:
    139         set_fapl(plist, **kwds)

ValueError: Unknown driver type "ros3"

the same lines run without error on the non-test hub

can we add the dandi package to the NWBstream environment?

DANDIset data not visible in dandiset folder on DANDIHub

It seems that the DANDIset data are not visible in the dandiset folders.
This behavior is equivalent on all containers related to all images (the default/base one, the GPU and the MATLAB one).

We are not sure from where comes the problem, but we suspect that either the remote folder where the DANDIset data are cloned is not sync/cloned properly (only the "root" metarepository is cloned, not each subrepository), or that the dandi-io plugin is failing.

This impacts all the containers started for any docker image of this repository.

Please find here two videos that shows that the dandiset subfolders do not contains any data in the base image/container and in the GPU container.

nothing-base-2023-04-24_15.33.35.mp4

nothing-2023-04-24_15.31.28.mp4

Desktop VNC leads to 404

Selecting the "Desktop" option from the launcher

leads to

Also, missing thumbnail for that ImJoy elFinder (which also terminates in the same error)

migrate docker images to https://hub.docker.com/u/dandiarchive ?

ATM under https://hub.docker.com/r/satra/dandihub/tags

Establish live script example library repository

The MatNWB working group has been working towards establishing an organized library of example live scripts. The consensus view was that it would best be hosted by the DANDI archive organization (or an affiliated one).

Based on discussions, it is proposed to tackle this issue in stages:

1. Stand up repository on a third-party organization, e.g. the INCF
2. Document the repository organization guidelines, fine-tune if needed, and implement
3. Upload initial examples (following the submitter guidelines per above)
4. Transfer repository to the DANDI archive organization (or an affiliated one)
5. Configure the DANDIHub MATLAB environment to pre-install the repository & set this as the initial working directory for new users.

See comments below and/or sub-issues for further notes regarding each step.

Desktop blank screen

Last time I was using the Desktop mode on DANDI Hub, I clicked the button "Send CtrlAltDel" in the upper right corner. Now, I get this screen, even after shutting down the server, logging out, and logging back in

add support for the AWS FSx for lustre CSI driver

AWS has created a driver to dynamically provision fsx: https://github.com/kubernetes-sigs/aws-fsx-csi-driver

we would want to mount the dandiarchive in readonly mode on any jupyterhub user pod.

sudo password

@satra - I again have problem with permission when allocating the ports, this time using singularity-compose:

(sing_comp) jovyan@jupyter-djarecka:~/spyglass_test$ singularity-compose up
[sudo] password for jovyan:
Sorry, try again.

Automatically mint/set/provide a token/credential for a user to authenticate to github

for git push etc.

I have not found any hit in search for token which might relate, although I thought we had that discussion before, and @dnkennedy was eager to get that for the ReproHub (which is based on older ABCD/ , which has common base with this hub in 2020.

desktop login screen

I left the desktop inactive for a few minutes. When I came back I saw this screen:

Is there any way for me to sign in? If not, can we disable this locking feature?

Warning when running notebooks (Failed to load cfgrib)

So I noticed this warning whenever I run a notebook on the Hub lately

/opt/conda/lib/python3.10/site-packages/xarray/backends/cfgrib_.py:29: UserWarning: Failed to load cfgrib - most likely there is a problem accessing the ecCodes library. Try `import cfgrib` to get the full error message
  warnings.warn(

So I tried the import it requests directly which gives the full traceback

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 import cfgrib

File /opt/conda/lib/python3.10/site-packages/cfgrib/__init__.py:20, in <module>
     18 # cfgrib core API depends on the ECMWF ecCodes C-library only
     19 from .abc import Field, Fieldset, Index, MappingFieldset
---> 20 from .cfmessage import COMPUTED_KEYS
     21 from .dataset import (
     22     Dataset,
     23     DatasetBuildError,
   (...)
     27     open_from_index,
     28 )
     29 from .messages import FieldsetIndex, FileStream, Message

File /opt/conda/lib/python3.10/site-packages/cfgrib/cfmessage.py:29, in <module>
     26 import attr
     27 import numpy as np
---> 29 from . import abc, messages
     31 LOG = logging.getLogger(__name__)
     33 # taken from eccodes stepUnits.table

File /opt/conda/lib/python3.10/site-packages/cfgrib/messages.py:28, in <module>
     25 import typing as T
     27 import attr
---> 28 import eccodes  # type: ignore
     29 import numpy as np
     31 from . import abc

File /opt/conda/lib/python3.10/site-packages/eccodes/__init__.py:13, in <module>
      1 #
      2 # (C) Copyright 2017- ECMWF.
      3 #
   (...)
     10 #
     11 #
---> 13 from .eccodes import *  # noqa
     14 from .highlevel import *

File /opt/conda/lib/python3.10/site-packages/eccodes/eccodes.py:12, in <module>
      1 #
      2 # (C) Copyright 2017- ECMWF.
      3 #
   (...)
     10 #
     11 #
---> 12 from gribapi import (
     13     CODES_PRODUCT_ANY,
     14     CODES_PRODUCT_BUFR,
     15     CODES_PRODUCT_GRIB,
     16     CODES_PRODUCT_GTS,
     17     CODES_PRODUCT_METAR,
     18 )
     19 from gribapi import GRIB_CHECK as CODES_CHECK
     20 from gribapi import GRIB_MISSING_DOUBLE as CODES_MISSING_DOUBLE

File /opt/conda/lib/python3.10/site-packages/gribapi/__init__.py:13, in <module>
      1 #
      2 # (C) Copyright 2017- ECMWF.
      3 #
   (...)
     10 #
     11 #
---> 13 from .gribapi import *  # noqa
     14 from .gribapi import __version__, lib
     16 # The minimum recommended version for the ecCodes package

File /opt/conda/lib/python3.10/site-packages/gribapi/gribapi.py:34, in <module>
     30 from functools import wraps
     32 import numpy as np
---> 34 from gribapi.errors import GribInternalError
     36 from . import errors
     37 from .bindings import ENC

File /opt/conda/lib/python3.10/site-packages/gribapi/errors.py:16, in <module>
      1 #
      2 # (C) Copyright 2017- ECMWF.
      3 #
   (...)
      9 # does it submit to any jurisdiction.
     10 #
     12 """
     13 Exception class hierarchy
     14 """
---> 16 from .bindings import ENC, ffi, lib
     19 class GribInternalError(Exception):
     20     """
     21     @brief Wrap errors coming from the C API in a Python exception object.
     22 
     23     Base class for all exceptions
     24     """

File /opt/conda/lib/python3.10/site-packages/gribapi/bindings.py:35, in <module>
     33 library_path = findlibs.find("eccodes")
     34 if library_path is None:
---> 35     raise RuntimeError("Cannot find the ecCodes library")
     37 # default encoding for ecCodes strings
     38 ENC = "ascii"

RuntimeError: Cannot find the ecCodes library

Out of space on k8s?

Tried to start tiny instance , got

Neovim

Hi,

Is it possible to install neovim on dandihub? Copilot is now integrated with neovim and it would be great to have access to it in the terminal.

Thanks
Nima

Provide ability to start/monitor your own HPC in the cloud

I guess should be done after we provide "pay for compute" facility for better integration etc.

But possibly we could just use reproman for now (and/or in the future) -- just have it installed, test-drive on sample use cases, and improve/adjust.

Investigate how to test the playbook

This issue is to

describe the remaining problems with full automation
identify components which should be separated (to simplify testing)
~~3. estimate cost for full e2e tests on Amazon~~ (This can be done later)
identify what can be tested outside of EC2

Termination handler fails to install

When deploying the DandiHub from scratch in a new AWS account following the instructions in the repo I get the following error.

Release "aws-node-termination-handler" does not exist. Installing it now.
Error: unable to build kubernetes objects from release manifest: resource mapping not found
for name: "aws-node-termination-handler" namespace: "" from "":
no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"
ensure CRDs are installed first

This error traces to the installation of aws-node-termination-handler in z2jhl.yml, i.e.

  - name: Add termination handler
    shell: helm upgrade --install aws-node-termination-handler \
      --namespace kube-system \
      eks/aws-node-termination-handler

Digging a little bit into the error message, in particular into no matches for kind "PodSecurityPolicy" in version "policy/v1beta1" I found that Kubernetes has deprecated v1beta1:

The policy/v1beta1 API version of PodDisruptionBudget will no longer be served in v1.25.

    Migrate manifests and API clients to use the policy/v1 API version, available since v1.21.
    All existing persisted objects are accessible via the new API
    Notable changes in policy/v1:
        an empty spec.selector ({}) written to a policy/v1 PodDisruptionBudget selects all pods in the namespace
         (in policy/v1beta1 an empty spec.selector selected no pods). An unset spec.selector selects
          no pods in either API version.

PodSecurityPolicy

PodSecurityPolicy in the policy/v1beta1 API version will no longer be served in v1.25, and the
 PodSecurityPolicy admission controller will be removed.

Migrate to Pod Security Admission or a 3rd party admission webhook. For a migration guide, 
see Migrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller. 
For more information on the deprecation, see PodSecurityPolicy Deprecation: Past, Present, and Future.

Also see:
PodDisruptionBudget, PodSecurityPolicy in https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-25.

Apparently the kubectl version installed when running the z2jh.yml instruction for downloading kubectl is v1.25. I.e. this instruction in z2jh.yml

wget -O kubectl https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl

leads to the installation of kubectl v1.25 which following the previous links discontinued support for v1beta1.

On the cluster end, if I run all commands and get it running without installing aws-node-termination-handler I still get some errors due to the deprecation of the policy.
In particular, this is what I get in the user-scheduler which might be just a consecuence of not having the helm chart for aws-node-termination-handler:

E1206 10:07:45.986565       1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.PodDisruptionBudget: failed to list *v1beta1.PodDisruptionBudget: the server could not find the requested resource

UPDATE 1
This issue can be narrowed down to the rbac.pspEnabled option in the helm chart since running

helm install --namespace kube-system aws-node-termination-handler eks/aws-node-termination-handler --set rbac.pspEnabled=false

leads to a successful installation.
Reviewing the chart config I can see that there is still a dependency with policy/v1beta1 in https://github.com/aws/eks-charts/blob/8e82f74d75221964d604d3c7b8c70da10160b88e/stable/aws-node-termination-handler/templates/psp.yaml#L2.

UPDATE 2
Changing https://github.com/aws/eks-charts/blob/8e82f74d75221964d604d3c7b8c70da10160b88e/stable/aws-node-termination-handler/templates/psp.yaml#L2. from policy/v1beta to policy/v1 still leads to the same error.

README.md: provide high-level description on what this hub is "on top of jupyterlab"

with pointers to corresponding files

that it is instrumenting k8s cluster in AWS
docker container recipe for each k8s pod
custom pre-installed (dandisets) and others things
configuration for jupyterlab -- e.g. addition of desktop etc
...

attn @asmacdo

Having trouble logging in to dandihub

I go to https://hub.dandiarchive.org/ and log in with github and I get the following error:

403 : Forbidden

Thanks!
Jeremy

Desktop VNC requires login

Follow-up to the now-working 'Desktop' button from #39, it now leads to this page

I tried my GitHub password (since my DANDI account was made via my GitHub account) but that did not work. I also tried my DANDI API key, that did not work either.

error when downloading dandiset on MATLAB instance

Running on the DANDI Hub MATLAB instance:

jovyan@jupyter-bendichter:~$ dandi download DANDI:000067/0.210812.1457
2023-05-25 15:56:16,815 [    INFO] Logs saved in /home/jovyan/.cache/dandi-cli/log/20230525155615Z-1006.log
Error: module 'lib' has no attribute 'OpenSSL_add_all_algorithms'

jovyan@jupyter-bendichter:~$ cat /home/jovyan/.cache/dandi-cli/log/20230525155615Z-1006.log
2023-05-25T15:56:15+0000 [INFO    ] dandi 1006:140414409291584 dandi v0.55.1, hdmf v3.5.2, pynwb v2.3.1, h5py v3.8.0
2023-05-25T15:56:15+0000 [INFO    ] dandi 1006:140414409291584 sys.argv = ['/opt/conda/bin/dandi', 'download', 'DANDI:000067/0.210812.1457']
2023-05-25T15:56:15+0000 [INFO    ] dandi 1006:140414409291584 os.getcwd() = /home/jovyan
2023-05-25T15:56:15+0000 [DEBUG   ] urllib3.connectionpool 1006:140414409291584 Starting new HTTPS connection (1): rig.mit.edu:443
2023-05-25T15:56:15+0000 [DEBUG   ] urllib3.connectionpool 1006:140414409291584 https://rig.mit.edu:443 "GET /et/projects/dandi/dandi-cli HTTP/1.1" 200 579
2023-05-25T15:56:15+0000 [DEBUG   ] dandi 1006:140414409291584 No newer (than 0.55.1) version of dandi/dandi-cli found available
2023-05-25T15:56:15+0000 [DEBUG   ] h5py._conv 1006:140414409291584 Creating converter from 7 to 5
2023-05-25T15:56:15+0000 [DEBUG   ] h5py._conv 1006:140414409291584 Creating converter from 5 to 7
2023-05-25T15:56:15+0000 [DEBUG   ] h5py._conv 1006:140414409291584 Creating converter from 7 to 5
2023-05-25T15:56:15+0000 [DEBUG   ] h5py._conv 1006:140414409291584 Creating converter from 5 to 7
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'zlib'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'gzip'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'bz2'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'lzma'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'blosc'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'zstd'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'lz4'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'zfpy'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'astype'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'delta'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'quantize'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'fixedscaleoffset'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'packbits'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'categorize'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'pickle'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'base64'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'shuffle'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'bitround'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'msgpack2'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'crc32'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'adler32'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'json2'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'vlen-utf8'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'vlen-bytes'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'vlen-array'
2023-05-25T15:56:15+0000 [DEBUG   ] numcodecs 1006:140414409291584 Registering codec 'n5_wrapper'
2023-05-25T15:56:16+0000 [DEBUG   ] dandi 1006:140414409291584 Caught exception module 'lib' has no attribute 'OpenSSL_add_all_algorithms'
2023-05-25T15:56:16+0000 [INFO    ] dandi 1006:140414409291584 Logs saved in /home/jovyan/.cache/dandi-cli/log/20230525155615Z-1006.log

Desktop screen in GPU instances

It seems like I can't run a Desktop screen from a GPU instance, is this a bug or expected behavior?
I would like to test DeepLabCut using their GUI, which I already managed to run with singularity in a non-GPU instance, but I get some backend errors printed out and I suspect the processing will not work without GPU.

The first response I get is this:

Then if I refresh the page:

Helper (cookie cutter) to easily create "service launchers"

I guess those should be some kind of jupyter lab extension: https://jupyterlab.readthedocs.io/en/stable/extension/extension_tutorial.html . But never made one and looks like quite elaborate. Critical information needed to "template" such services is

script to run to start the service which would serve from some http://localhost:<port> (or https)
port it will run on (well, ideally should be communicated somehow by script so there could be multiple instances, but may be not that easy to allow for a range of ports for jupyter lab?)
name
icon

and based on that info we should template additional "launchers" to appear among those available . Then we should be able to easily create one for RAVE or any other

refactored the main branch

@sooyounga and @asmacdo - i took the step of merging the space telescope stuff with our stuff into a single submodule less branch. there is some readme refactoring to do.

[Warning]: Jupyter platform paths

Any time I run a Python command from the CLI, from any kernel, I get this warning printed out

Following the instructions of course resolves the warning, but this gets incredibly annoying to do every time I launch a new environment or restart the server

provide user.name and user.email git config settings

I wonder if it is somehow magically possible, since users login via github, to pre-populate .gitconfig in "their" account with information from their github account. That would be really slick.
If not, we at least should populate ~/.gitconfig with some stub values "DANDI hub users" and "[email protected]" or alike. But I really hope that magic is real! ;)

determine if gpu is enabled on hub

Some containerized spike sorters require an NVIDIA GPU. When these containers are run, SpikeInterface first checks if a user has an NVIDIA GPU installed by running nvidia-smi. Googling around indicates this as the best way to determine if a gpu is available. However, this command is not available for dandi hub, even if it does have a GPU available, so the SI command fails with a warning saying that no GPU is present. This is blocking us from running containerized spike sorters on DANDI Hub.

Is there a reason this command does not work in Jupyter? Is there another method via command line or via python to check that a GPU is available? I'd like to keep the concept of checking for a GPU before going through the trouble of pulling a container, and I would be fine with adding some additional check logic that works on DANDI.