nebari-dev / governance Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 2.0 104 KB

✨ Governance-related work for Nebari-dev

License: BSD 3-Clause "New" or "Revised" License

community contributing decision-making maintenance

governance's Introduction

Governance

This repository contains governance-related work for Nebari, including:

⚖️ The complete Nebari Code of Conduct

🚀 Official project roadmap

📈 Details about documentation analytics

💬 Issues that are requests for discussions

governance's People

Contributors

Watchers

Forkers

iameskild pavithraes

governance's Issues

[DOC] - Add release process documentation to governance

[ENH] - Add an inclusivity statement

In addition to CoC I'd like to adopt explicit inclusivity guidelines like the ones from Kubeflow

RFD - Ways to Limit Argo Workflows Permissions - Mounting Volumes [Draft]

Status	Draft 🚧
Author(s)	Adam-D-Lewis
Date Created	03-31-2023
Date Last updated	03-31-2023
Decision deadline	?

In Argo Workflows users with permissions to use Argo Workflows can mount any other users home directory. This is not acceptable. I discuss some options to limit this behavior. Some options include:

Use a Kubernetes operator to limit what subpaths can be mounted by particular pods (or put users in their own namespaces then limit which subPaths can be mounted in that namespace with a CRD and an Operator)
- The Problem with this is that we could only kill the Workflow after it's created, potentially allowing for something bad to happen in the meantime. (Delete all users files, etc.)
~~Limit users to running particular Argo Workflow templates~~
- This prevents users from using hera, argo CLI, create workflows with multiple steps, etc.
- https://argoproj.github.io/argo-workflows/workflow-restrictions/
~~Argo Workflows has plugins which could allow us to crash any workflows with wrong volumes mounted.~~
- We'd have to use this with restricting users to use templates which has the same disadvantages as above.
Create Nebargo, a fastapi server that all users submit workflows to. It examines the workflow to see if the user is mounting volumes they shouldn't and forwards the request to argo-server or not accordingly.
- this limits what tools you can use - no hera, no argo CLI :(
AdmissionController
- Write an AdmissionController that rejects Workflows that try to mount Volumes they shouldn't
- The AdmissionController will be a fastapi server that has access to look up a user's groups from keycloak. It can then validate only volumes the user has permission to access are included.
- The AdmissionController gets the name of the user who submitted the Workflow from the workflows.argoproj.io/creator=452fcf19-d3ca-4813-a250-2b2e1bb7bd9d tag on the workflow (keycloak user ID).
- Links
Pod Security Admission/Pod Security Standards
1. https://kubernetes.io/docs/concepts/security/pod-security-policy/
2. Might work, but I'm not sure it's flexible enough
~~Limit users to their own namespace~~
- Because PVs are cluster wide, I don't think this would help with preventing users from mounting volumes to pods that they shouldn't.

I think the AdmissionController is the best way forward at the moment.

RFD - Extension Mechanism for Nebari

Status	Accepted ✅
Author(s)	@costrouc
Date Created	03-28-2023
Date Last updated	03-28-2023
Decision deadline	04-15-2023

Title

Extension Mechanism for Nebari

Summary

Over the past 3 years we have consistently run into the issue where extending and customizing Nebari has been a hard task. Several approaches have been added:

the addition of stages to the nebari deployment to make it easier to isolate pieces and was work that was done to make the extension mechanism easier to move towards
usage of terraform_overrides and helm_overrides keyword to allow for arbitrary overrides of helm values.
helm_extensions in stage 8 which allow for the addition of arbitrary helm charts
tf_extensions which integrate oauth2 and ingress to deploy a single docker image

Despite these features we still have needs from users and we are not addressing them all. Additionally we have issues when we want to add a new services it typically has to be directly added to the core of Nebari. We want to solve this by making extensions first class in Nebari.

User benefit

I see quite a few benifits from this proposal:

easier to extend Nebari making it easier to split development of nebari into smaller teams e.g. core Nebari team, feature-x team
easier customization of stages since the extension mechanism will solidify the interfaces between stages
easier adoption of new ways to deploy stages. Personally excited about this feature since it could make adoption of terraformpy easier.
ad-hoc client customizations will be significantly easier
ways to have proprietary additions to nebari that do not require deep customization

Design Proposal

Overall I propose we adopt pluggy. Pluggy has been adopted by many major projects including: datasette, conda, (TODO list more). Pluggy would allow us to expose a plugin interface and "install" extensions via setuptools entrypoints. Making extension installation as easy as pip install ...

Usage from a high level user standpoint

pip install nebari
pip install nebari-ext-clearml
pip install nebari-ext-helm
pip install nebari-ext-cost

Once a user installs the extensions we can view the installed extensions via:

$ nebari extensions list
Name                       Description
---------------------------------------------------------------------------
nebari-ext-clearml "ClearML integration into nebari"
nebari-ext-helm     "Helm extensions"
....

Plugin Interfaces

Within nebari we will expose several plugins:

Subcommands

A plugin interface for arbitrary additional typer commands. All commands will be passed in the nebari config along with all specified command line arguments from the user. Conda has a similar approach with typer for their system.

nebari cost

Stages

class Stage:
    name: str
    description: str
    priority: str    # defaults to value of name

    def validate(self, config: schema.NeabriConfig):
         """Perform additional validation of the nebari configuration specific to stage

         """

    def render(self, config: schema.NebariConfig) -> typing.Union[typing.Dict[str, bytes], pathlib.Path]:
          """Given a set configuration render a set of files

         Returns
         ------------
         typing.Union[typing.Dict[str, bytes], pathlib.Path]
              Returns either a directory to copy over files or a dictionary of keys mapping to file bytes
         """
         ...

      def deploy(self, directory: pathlib.Path, stages: typing.Dict[str, typing.Any]) -> typing.Any:
            "Deploy all resources within the stage
	         
            "
            ...

      def destroy(self, directory: pathlib.Path):
            "Destroy all resources within the stage"
            ...

Nebari will use pluggy within its core and separate each stage into a pluggy Stage . Each stage will keep it's original name.

Alternatives or approaches considered (if any)

As far as plugin/extension systems go I am only aware of two major ones within the python ecosystem:

pluggy
traitlets :: I have used traitlets on several projects and do not feel it is a good fit here because:
- traitlets is extremely invasive to the codebase it has opinions on the class structure/class creation
- exposes a cli
- opinionated way to perform customization

Best practices

This will encourage the practice of extending nebari via extensions instead of direct PRs to the core.

User impact

It is possible to make this transition seamless to the user without changing behavior.

Unresolved questions

I feel confident in the approach since I have seen other project use pluggy succefully for similar work.

RFD - Nebari Permissions Model

Status	Accepted ✅
Author(s)	@aktech
Date Created	04-04-2024
Date Last updated	10-04-2024
Decision deadline	30-04-2024

Summary

Nebari doesn't have a proper RBAC model yet. As a consequence, providing fine grained control of access to Nebari's services to users and groups is not possible. This poses a risk of the user might inadvertently accessing and modifying any data or service within Nebari that violates the principle of least privilege.

User benefit

Role Based Access Control (RBAC) in Nebari will provides fine grained control of access to Nebari’s services.

Design Proposal

Current Permissions model in Nebari

To understand the proposal for the new RBAC, its important to understand current permissions model, we will go through this very briefly here. Also see nebari-dev/nebari#2304 for more context.

JupyterHub

RBAC came in JupyterHub in 2.x and we upgraded from 1.5 in Aug, 2023:

This means we never got to implement JupyterHub's RBAC and have our limited in-house baked permissions model.

Which is basically getting server options for the given user from keycloak

Only two levels of permissions available at this point:

jupyterhub_developer
jupyterhub_admin

If the user is not in any group, the user will get 403 on accessing Nebari.

Conda Store

At the moment we have the following roles:

conda_store_superadmin
conda_store_admin
conda_store_developer
conda_store_viewer

This is set in conda-store configuration:

c.CondaStoreServer.authentication_class = KeyCloakAuthentication

The KeyCloakAuthentication class fetches the user data from keycloak
via keycloak's conda-store client and finds the roles user have and based on that, it returns user's role binding such that user have corresponding permissions on conda-store.

This also fetches user's groups and creates conda-store namespaces (adding them to the conda-store db).

Grafana

Nebari has following Grafana roles, which in code maps to corresponding Grafana roles (1-to-1 mapping):

Nebari Roles	Grafana roles
grafana_admin	Admin
grafana_developer	Editor, Viewer
grafana_viewer	Viewer

Dask

dask_gateway_admin
dask_gateway_developer

This uses NebariAuthentication(JupyterHubAuthenticator) to define custom authentication, which checks if either of the above are present on user's roles

If none of these are present, user has no access to create dask clusters.
This makes calls to JupyterHub's API to get user roles and groups from the following keys in JupyterHub's /user endpoint.

auth_state.oauth_user.roles
auth_state.oauth_user.groups

Argo

Argo has following roles:

argo-admin
argo-developer
argo-viewer

Three k8s service accounts are created with the above three levels of permissions and based on that user is assigned permissions. These roles are assigned in keycloak.

Problems

Fine grained access cannot be provided to users/groups.
Roles are static. New roles cannot be created and existing roles cannot be updated (In theory they can be updated and created, but it will not have the desired effect on Nebari services)

Idea

The idea is to not re-invent the wheel but rather try to use existing RBAC frameworks wherever possible with little or no modifications to support wide range of fine-grained control. We will use JupyterHub's RBAC as a motivation to implement RBAC in Nebari. We'll also try to use similar conventions to avoid confusion and reduce the learning curve of yet another rbac system.

JupyterHub RBAC: brief overview

JupyterHub defines the following fundamental concepts:

Group / User: single or a set of users
Role: Roles are collections of scopes.
Scope: Permission - specific permissions used to evaluate API requests.

Groups/Users are assigned Roles which are a collection of scopes.

Nebari RBAC: Proposal

The idea is to be able to manage roles and permissions from a central place, in this case keycloak. An admin or anyone who has permissions to create a role in keycloak will create role(s) with assigned scopes (permissions) to it and attach it to user(s) or group(s). We define the following concepts (some of them already exist):

Service

This represents the main services in Nebari. There is a keycloak client created for most nebari services. The idea here is those services will call keycloak API (they already do at the moment for authentication) to fetch roles from a particular client for a user and using the roles's attributes it will decide what permissions the user have on that service. Here are some of the main core services:

Jupyterhub
Grafana
Conda Store
Argo

Component

A service can have several components and each component can have the need for a user or group to have different levels of access. We call them as component. For example, the JupyterHub service can have the following components:

JupyterHub itself (JupyterHub API).
Jupyterhub Profiles (this is spawner dependent, which is KubeSpawner profiles in Nebari)
shared-directory (e.g. creating a shared nfs for a particular group), it is categorized under JupyterHub service because it is created by jupyterhub hooks.
Dask: dask is accessed from JupyterHub.

Roles and Scopes

a figure showing how services, components and scopes are related

This figure depicts how services, components and scopes are related. Note that the scopes for grafana and argo are only for demonstration purpose.

Role is a collection of scopes (permissions).

Scope is a permissions to resource in a component of a service. We're borrowing the syntax for defining scopes (or permissions) from JupyterHub's RBAC. See https://jupyterhub.readthedocs.io/en/latest/rbac/scopes.html#scope-conventions for reference.

In a nutshell, it looks something like this:

<access-level><resource>:<subresource>!<object>=<objectname>

<resource>:<subresource> - vertical filtering
<resource>!<object>=<objectname> - horizontal filtering.
If <access-level> is not provided, we assume maximum permissions.
<subresource> is optional
!<object>=<objectname> is optional

This is an example of how users, groups and roles interact. In this the group gpu-access-conda-store-pycon-argo-viewer-group has 3 roles, group gpu-users-group has one role and the group pycon-tutorial-group has one role attached to it. User alice has one role attached to them and the user john has no roles attached to them.

Examples:

1. Create a role such that when attached to a group it will create shared nfs directory in `/shared`.

You'll go to keycloak client jupyterhub and create a role with a meaningful name and add the following attributes to the role:

Role: create-shared-directory

Key	Value
component	shared-directory
scopes	`create:shared`

When you'd attach this role to a group, then Nebari will make sure to create a shared directory for the group.

Next, we want to have two sets of permissions:

group of people (pycon-shared-read-group) with read access to a shared directory.
group of people (pycon-shared-write-group) with write access to a shared directory.

We will create two roles:

Role: read-access-to-pycon-shared-directory-role
This role will be attached to pycon-read-group

Key Value

component shared-directory

scopes read:shared!shared=pycon
Role: write-access-to-pycon-shared-directory-role
This role will be attached to pycon-read-group

Key Value

component shared-directory

scopes write:shared!shared=pycon

Note: The names of roles and groups are arbitrary and can be anything.

2. Roles to control access to conda store namespace

Create a role such that when attached to a group name read-conda-pycon-group, the group members will have read access to conda environments in pycon namespace.
Create a role such that when attached to a group name write-conda-pycon-group, the group members will have read access to conda environment in pycon namespace.

Role: read-access-conda-pycon-namespace-role
This role will be attached to read-conda-pycon-group

Key	Value
component	conda-store
scopes	`read:conda-store!namespace=pycon`

Role: write-access-conda-pycon-namespace-role
This role will be attached to write-conda-pycon-group

Key	Value
component	conda-store
scopes	`write:conda-store!namespace=pycon`

3. App sharing permission

Create a role to allow everyone to share apps in a particular group.

Since we're using JupyterHub's RBAC, we can use scopes convention directly here.

Role: allow-app-sharing-role
This role will be attached to allow-app-sharing-group

Key	Value
component	jupyterhub
scopes	`shares!user,read:users:name,read:groups:name`

See https://jupyterhub.readthedocs.io/en/latest/reference/sharing.html#enable-sharing

By default, it will be disabled for everyone.

Implementation steps:

Since this is a complex piece of functionality, we would need to implement this in small modular steps. The steps are mentioned below:

Fetch keycloak groups and roles from Keycloak and sync them in JupyterHub. nebari-dev/nebari#2308. At the moment we do have these available from the /user endpoint in the Hub API, under following keys. These needs to to be synced with JupyterHub roles and groups so that the permissions are actually applied at JupyterHub level.
- auth_state.oauth_user.roles
- auth_state.oauth_user.groups
Update the render_profiles functionality in profiles in jupyterhub config, such that it creates shared directory only if the group has permissions. This would also require us to implement a way to parse role scopes, e.g: parsing read:shared!shared=pycon for shared directory, when the component is shared-directory.
Implement scopes parsing can evolve as we implement rbac for each service. I don't expect us to write perfect scope parsing in one go. We should start with a service. I would suggest JupyterHub and make scope parsing for shared-directory work for it. For jupyterhub component it would work without parsing, after they are synced from keycloak into jupyterhub.
Implement scope parsing for conda-store and based on that incorporate conda-store's new permissions model.
Follow the same for Argo and Grafana.

Notes

The RBAC system described above gives us the flexibility to create fine-grained permissions to specific resources and groups. While this is useful, we also need to make sure we set sensible defaults. Every user of Nebari might not need this level of fine-grain ability so we need to create some roles and groups by default when deploying a Nebari to facilitate basic permissions.
JupyterHub also has a concept of custom scopes, but it's not very flexible and it does not enforce that your services apply them, so might not be very useful, but still worth exploring.
JupyterHub doesn't have a way to update roles without restarting Hub, we need a way to tackle this limitation, ideally contributing to JupyterHub to update roles dynamically. See: https://jupyterhub.readthedocs.io/en/stable/rbac/roles.html#removing-roles

Alternatives or approaches considered (if any)

Best practices

The goal of this proposal is to implement the following best practices:

Principle of least privilege.
Adopting from an existing RBAC system in the Jupyter Ecosystem.

User impact

The implementation would change the default permissions of a user so it would affect the user, but we can set up with sensisble defaults to reduce the impact.

Unresolved questions

RFD - Upgrade Integration tests - WIP

Status	Draft 🚧 / Open for comments 💬/ Accepted ✅ /Implemented 🚀/ Obsolete 🗃 / Rejected ⛔️
Author(s)	@viniciusdc
Date Created	02-02-2023
Date Last updated	--
Decision deadline	--

Summary

Currently, our integration tests are responsible for deploying a target version of Nebari (generally based on main/develop) to test stability and confirm that the code is deployable in all cloud providers. These tests can be divided into three categories: "Deploy", "User-Interaction," and "Teardown".

The user interaction is executed by using Cypress to mimic the steps a user would take to use the basic functionalities of Nebari.

The general gist of the workflow can be seen in the diagram above. Some providers like GCP have yet another intermediate job right after the deployment, where a slightly small change is made in the nebari-config.yaml to assert that the inner actions (those that come with Nebari) are working as expected.

While the above does help when testing and asserting everything "looks" OK, we still need to double-check in every release doing yet another independent deployment to carefully test all features/services and ensure everything is working as expected. This seems like extra work that takes some time to complete (remember that a new deployment on each cloud provider takes around 15~20 min, + any additional checks).

That said, there are still a lot of functionalities that we might need to remember to test that are part of the daily use of Nebari, and making sure all of that works in all providers would become impractical.

Design proposal

what we could do to enhance our current testing suit. These are divided into three major updates:

Stabilizing/backend test

Refactor the "deploy" phase of the workflow so instead of executing the full deployment in serial (aka. just run nebari deploy), we could instead deploy each stage of nebari in parts, and this would give us the freedom to do more testing around each new artifact/resource added in each stage. This can now be easily done due to the recent additions of a Nebari dev command in the CLI. A way to achieve this would be adding an extra dev flag to the neabari deploy command to stop at certain checkpoints (which in this case, are the beginning of a new stage)

CI runs nebari deploy -c .... --stop-at 1. This would be responsible for deploying nebari until the first stage (generating the corresponding terraform state files for state tracking). The CI would then execute a specialized test suit (could be pytest, python scripts...) to assert that:
- The cloud resources created are indeed present in the cloud infrastructure (can be done using the cloud provider CLI tools)
- Check that kubernets-related resources exist as expected (kubectl extra commands checks)
- Atest that all available endpoints exist, and run appropriate functions to each API (in case of extensions/services like Argo etc..)
After the above tests are complete, execute nebari deploy -c .... --stop-at 2, which would refresh the previous resources and create the new ones. Then stop and run tests accordingly....
- ...

End-to-End testing (User experience)

Now that the infrastructure exists and is working as planned, we can mimic the user interaction by running a bigger testing suit for cypress (we could also migrate to another tool for easier maintenance). Those tests would then be responsible for checking that Jupyter-related services works, Dask, any extra services like Argo, kbatch, VScode, Dashboards, conda-store...

Teardown

Once all of this completes, we can then move to destroy all the components, right now there is no extra changes to this step, but something we could add it would be beneficial are this:

Develop cloud specific scripts for removing lingering resources in case of failing nebari destroy
Save information around the error (why it failed) as artifacts like status about the cluster, roles, etc. that could help us identify why some resources keep staying after destruction and how we could try to reduce it (or at least catalog those in the docs)

User benefit

The user, in this case, would be the maintainers and developers of Nebari who would be able to trust more in the integration tests and retrieve more information on each runs, reducing a lot of the time used by testing all features as well as the confidence that all services and resources were tested and validated before release.

Alternatives or approaches considered (if any)

Best practices

User impact

Unresolved questions

RFD - Proposal for more explicit nebari-config.yaml

Status	Open for comments 💬
Author(s)	@Adam-D-Lewis
Date Created	04-11-2024
Date Last updated	04-11-2024
Decision deadline	TBD

Proposal for more explicit nebari-config.yaml

Summary

The average user doesn't know what options they have available to them to adjust from nebari-config.yaml. Those who deploy fresh deployments may run into the situation where they need to adjust these immediately (see here, and it's not apparent how to do so and not straightforward at the moment.

Additionally, extensions InputSchema are not currently written to nebari-config.yaml, meaning users have to copy those in manually to manage them from nebari-config.yaml.

User benefit

Users will better understand what they can configure from the nebari-config.yaml file. They will also see what they've set to custom values easily since those will be the only values not commented out.

Design Proposal

The following features are a part of this design proposal.

Add sections to the nebari-config.yaml
attributes that are set to their default values are commented out so it's easy for users to see what they've set explicitly
ability for the extension to exclude certain fields from being written to nebari-config.yaml.
add ability to display helpful descriptions for each field in nebari-config.yaml
if default values change in Nebari, then running nebari upgrade should notice the updated defaults and update the corresponding commented out default values in nebari-config.yaml

A sample nebari config.yaml might look something like below under this design proposal (inspired from grafana's defaults.ini)

##################### Main #####################

# Whether to prevent deploying after upgrades due to potentially problematic changes
prevent_deploy: false
# Current nebari version used for deployment
nebari_version: 2024.3.3.dev138+g924659d5
# whether to use a cloud provider, local, or existing k8s cluster
provider: local
# k8s namespace to deploy into
namespace: dev
# project name for the deployment
project_name: scratch-2efc

##################### Bootstrap Stage #####################
# whether to use CI/CD for deployments
# ci_cd:
#   type: none
#   branch: main
#   commit_render: true
#   before_script: []
#   after_script: []

##################### Terraform State Stage #####################
# where to store the terraform state
terraform_state:
  type: remote
  backend:
  config: {}

##################### Infrastructure Stage #####################
# local deployment settings
local:
  kube_context:
  node_selectors:
    general:
      key: kubernetes.io/os
      value: linux
    user:
      key: kubernetes.io/os
      value: linux
    worker:
      key: kubernetes.io/os
      value: linux

##################### Kubernetes Initialize Stage #####################
# configuration for external container registry
external_container_reg:
  enabled: false
  access_key_id:
  secret_access_key:
  extcr_account:
  extcr_region:

##################### Kubernetes Ingress Stage #####################
# dns configuration
dns:
  provider:
  auto_provision: false

# ingress
ingress:
  terraform_overrides: {}

# certificate settings
certificate: self-signed
self_signed_certificate:

certificate:
  type: self-signed
domain: github-actions.nebari.dev

##################### Kubernetes Keycloak Stage #####################
security:
  authentication:
    type: password
  shared_users_group: true
  keycloak:
    initial_root_password: 9u9le3v5f3nb2kf98rh77xthjsl00v80
    overrides: {}
    realm_display_name: Nebari

##################### Kubernetes Services Stage #####################
# jhub apps settings
jhub_apps:
  enabled: false

# jupyterlab settings
jupyterlab:
  default_settings: {}
  idle_culler:
    terminal_cull_inactive_timeout: 15
    terminal_cull_interval: 5
    kernel_cull_idle_timeout: 15
    kernel_cull_interval: 5
    kernel_cull_connected: true
    kernel_cull_busy: false
    server_shutdown_no_activity_timeout: 15
  initial_repositories: []
  preferred_dir:

# jupyterhub settings
jupyterhub:
  overrides: {}

# jupyterlab telemetry settings
telemetry:
  jupyterlab_pioneer:
    enabled: false
    log_format:

# monitoring settings
monitoring:
  enabled: true
  overrides:
    loki: {}
    promtail: {}
    minio: {}
  minio_enabled: true

# argo workflows settings
argo_workflows:
  enabled: true
  overrides: {}
  nebari_workflow_controller:
    enabled: true
    image_tag: 2024.3.3

# conda_store settings
conda_store:
  extra_settings: {}
  extra_config: ''
  image: quansight/conda-store-server
  image_tag: 2024.3.1
  default_namespace: nebari-git
  object_storage: 200Gi

# default conda environments
environments:
  environment-dask.yaml:
    name: dask
    channels:
    - conda-forge
    dependencies:
    - python==3.11.6
    - ipykernel==6.26.0
    - ipywidgets==8.1.1
    - nebari-dask==2024.3.3
    - python-graphviz==0.20.1
    - pyarrow==14.0.1
    - s3fs==2023.10.0
    - gcsfs==2023.10.0
    - numpy=1.26.0
    - numba=0.58.1
    - pandas=2.1.3
    - xarray==2023.10.1
  environment-dashboard.yaml:
    name: dashboard
    channels:
    - conda-forge
    dependencies:
    - python==3.11.6
    - cufflinks-py==0.17.3
    - dash==2.14.1
    - geopandas==0.14.1
    - geopy==2.4.0
    - geoviews==1.11.0
    - gunicorn==21.2.0
    - holoviews==1.18.1
    - ipykernel==6.26.0
    - ipywidgets==8.1.1
    - jupyter==1.0.0
    - jupyter_bokeh==3.0.7
    - matplotlib==3.8.1
    - nebari-dask==2024.3.3
    - nodejs=20.8.1
    - numpy==1.26.0
    - openpyxl==3.1.2
    - pandas==2.1.3
    - panel==1.3.1
    - param==2.0.1
    - plotly==5.18.0
    - python-graphviz==0.20.1
    - rich==13.6.0
    - streamlit==1.28.1
    - sympy==1.12
    - voila==0.5.5
    - xarray==2023.10.1
    - pip==23.3.1
    - pip:
      - streamlit-image-comparison==0.0.4
      - noaa-coops==0.1.9
      - dash_core_components==2.0.0
      - dash_html_components==2.0.0

# jupyterlab profiles for users / dask workers
profiles:
  jupyterlab:
  - access: all
    display_name: Small Instance
    description: Stable environment with 2 cpu / 8 GB ram
    default: true
    users:
    groups:
    kubespawner_override:
      cpu_limit: 2.0
      cpu_guarantee: 1.5
      mem_limit: 8G
      mem_guarantee: 5G
  - access: all
    display_name: Medium Instance
    description: Stable environment with 4 cpu / 16 GB ram
    default: false
    users:
    groups:
    kubespawner_override:
      cpu_limit: 4.0
      cpu_guarantee: 3.0
      mem_limit: 16G
      mem_guarantee: 10G
  dask_worker:
    Small Worker:
      worker_cores_limit: 2.0
      worker_cores: 1.5
      worker_memory_limit: 8G
      worker_memory: 5G
      worker_threads: 2
    Medium Worker:
      worker_cores_limit: 4.0
      worker_cores: 3.0
      worker_memory_limit: 16G
      worker_memory: 10G
      worker_threads: 4

# Nebari theme settings
theme:
  jupyterhub:
    hub_title: Nebari - scratch-2efc
    hub_subtitle: Your open source data science platform, hosted
    welcome: Welcome! Learn about Nebari's features and configurations in <a href="https://www.nebari.dev/docs/welcome">the
      documentation</a>. If you have any questions or feedback, reach the team on
      <a href="https://www.nebari.dev/docs/community#getting-support">Nebari's support
      forums</a>.
    logo: 
      https://raw.githubusercontent.com/nebari-dev/nebari-design/main/logo-mark/horizontal/Nebari-Logo-Horizontal-Lockup-White-text.svg
    favicon: 
      https://raw.githubusercontent.com/nebari-dev/nebari-design/main/symbol/favicon.ico
    primary_color: '#4f4173'
    primary_color_dark: '#4f4173'
    secondary_color: '#957da6'
    secondary_color_dark: '#957da6'
    accent_color: '#32C574'
    accent_color_dark: '#32C574'
    text_color: '#111111'
    h1_color: '#652e8e'
    h2_color: '#652e8e'
    version: v2024.3.3.dev138+g924659d5
    navbar_color: '#1c1d26'
    navbar_text_color: '#f1f1f6'
    navbar_hover_color: '#db96f3'
    display_version: 'True'

# conda_store settings
storage:
  conda_store: 200Gi
  shared_filesystem: 200Gi

# docker images to use for deployment
default_images:
  jupyterhub: quay.io/nebari/nebari-jupyterhub:2024.3.3
  jupyterlab: quay.io/nebari/nebari-jupyterlab:2024.3.3
  dask_worker: quay.io/nebari/nebari-dask-worker:2024.3.3

##################### Nebari Terraform/Helm Extensions Stage #####################
# extensions to install
tf_extensions: []
helm_extensions: []

Not all Nebari config will be shown. Instead, just the config relevant to what they've chosen in the nebari init command will be shown.
E.g. Only the e.g. aws provider will be shown instead of all the cloud provider config sections.

Alternatives or approaches considered (if any)

We could allow more flexibility by allowing each stage to have it's own write_config method which returns yaml, but I'm not sure we have a use case for that currently and it's nice to enforce some stylistic uniformity between the sections.

Best practices

User impact

Unresolved questions

[DOC] - Write team compass

Items to add:

RFD - Make `nebari` internals aggressively private

Status	Open for comments 💬
Author(s)	@pmeier
Date Created	07-04-2023
Date Last updated	07-04-2023
Decision deadline	xx

Make `nebari` internals aggressively private

Summary

Currently, all internals of nebari are public and with that there comes a set of expectations from the users with the main one being backwards compatibility. Although there is no such thing as true private functionality in Python, it is the canonical understanding of the community that a leading underscore in a module / function / class name implies privacy and thus no BC guarantees.

AFAIK, nebari does not have an API. Thus, I propose to "prefix" everything with a leading underscore to avoid needing to keep BC for that.

User benefit

This proposal brings no benefit for the user, but rather for the developers. As explained above, having a fully public API brings BC guarantees with it. At least that is what users expect. With them in place it can be really hard to refactor / change internals later on although we never intended that to happen.

Design Proposal

The canonical understanding for privacy in Python is that it is implied by a leading underscore somewhere in the "path". For example

_foo
foo._bar,
_foo.bar,
foo._Bar.baz
foo.Bar._baz
_foo.Bar.baz
_foo.Bar.baz

are all considered private. This gives us multiple options to approach this:

Make all endpoints private: prefix every function / method / class with an underscore. This is fairly tedious also somewhat impacts the readability.
Make all namespaces under the main nebari package private, e.g. nebari._schema rather than nebari.schema. Since we aren't exposing anything from the main namespace this would effectively make everything private.
Inject an intermediate private namespace into the package, i.e. create nebari._internal and move everything under that. This is what pip does.
Rename the main package to _nebari, but still provide the script under the nebari name. This makes it a little awkward for invoking it through Python, i.e. python -m _nebari. If this is something we want to support, we can also create a coexisting nebari package that does nothing else but importing the CLI functionality from _nebari. This is what pytest does.

These are ordered in increasing order of my preference.

Alternatives or approaches considered (if any)

Instead of fixing our code to be private, we could also put a disclaimer in the documentation that we consider all internals private and thus there are no BC guarantees. However, we need to be honest with ourselves here. Although this would suffice from a governance stand point, we are making it easier for users to shoot themselves in the foot. And that is rarely a good thing.

User impact

If we want to adapt this proposal or something similar, we need to do it sooner than later. Since this change is BC breaking for everyone who is already importing our internals, we should do it as long as the user base is fairly small and thus even fewer people (hopefully none) are doing something we don't want for the the future.

Depending on how much disruption we anticipate, we could also go through a deprecation cycle with the prompt to get in touch with us in case a user depends on our internals. Maybe there is actually a use case for a public API?

RFD - Include SOPS for secret management

Status	Open for comments 💬
Author(s)	@iameskild
Date Created	2023-01-15
Date Last updated	2023-02-06
Decision deadline	2023-02-13

Summary

See relevant discussion:

nebari-dev/nebari#787
- SOPS discussion starting at nebari-dev/nebari#787 (comment)
nebari-dev/nebari#1547

Design Proposal

SOPS is a command-line tool for encrypting and decrypting secrets on your local machine.

In the context of Nebari, SOPS can potentially solve the following high-level issues:

allow Nebari administrator to manage sensitive secrets
- this includes the abiilty to store these secrets in git as part of a GitOps workflow
create (shared) kubernetes secrets that can be mounted to JupyterLab pods and other kubernetes resources
- this requires some additional work but should be worth the effort

Workflow

Starting point: a Nebari admin has a new secret some of their users may need (such as credentials for external data source). They have the appropriate cloud credentials available.

Generate KMS (or PGP) - only needs to be performed once
Encrypts the secret locally
Add the encrypted secret to the Nebari infrastructure folder
Redeploy Nebari in order to create Kubernetes secrets and associate those secrets with resources that need them

Handling secrets locally

Item 1. and 2. from the workflow outlined above can be performed directly using the cloud provider CLI (aws kms create-key) and the SOPS CLI (sops --encrypt <file>).

To make it easier for Nebari admins, I propose we add a new CLI command, nebari secret to handle items 1. and 2. This might look something like:

# requires cloud credentials
nebari secret create-kms-key -c nebari-config.yaml --name <kms-name>

This command would call the cloud provider API and generate the necessary KMS. In the process, this command could also generate the .sops.yaml configuration file to store the KMS and creation_rules.
It looks like SOPS doesn't have support for DO KMS (or DO doesn't have a KMS product?) and will likely need to rely on another method PGP / age keys.
Local deployments should also rely on PGP / age keys.

# encrypt secrets stored as a file
nebari secret encrypt --name <secret-name> --file <path/to/file>
# or from a literal string
nebari secret encrypt --name <secret-name> --literal <tOkeN>

# a decrypt command can be included as well
nebari secret decrypt --name <secret-name>

The encrypt command encrypts the secret and stores the encrypted secret in the designated location in the repo (./secrets.yaml).
The decrypt command decrypts the secret and prints it stdout.
Anyone performing this command on their local machine must have a cloud user that can use that KMS key.

Include these secrets in the Nebari cluster

Items 3. and 4. from the workflow outlined above refers to how to get these secrets included in the Nebari cluster so that they can be used by those who need them.

There exists this SOPS terraform provider which can decrypt these encrypted secrets during the deployment. To grab these secrets and use them, we can create a secrets module in stage/07-kubernetes-services that returns the output (i.e. secret) that can be used to create kubernetes_secrets as such:

Read/decrypt the data from the secret.yaml:

data "sops_file" "secrets" {
	source_file = "/path/to/secrets.yaml"
}

output "my-password" {
	value = data.sops_file.demo-secret.data["password"]
	sensitive = true
}

Consume above output to create Kubernetes secret (in parent module):

resource "kubernetes_secret" "k8s-secret" {
	metadata {
		name = "sops-demo-secret"
	}
	data = {
		username = module.sops.my-password
	}
}

At this point, the kubernetes secrets exist (encoded, NOT encrypted) on the Nebari cluster.

Including the secrets in the user's environment

Including secrets in the KubeSpawner's c.extra_pod_config) (in 03-profiles.py) will allow us to mount those secrets to the JupyterLab's user pod, thereby making them useable by the people.

c.extra_pod_config = {
	# as environment variables
	"containers": [
		"env": {}
	]
	# to pull images from private registries
	"image_pull_secret": {}
	# as mounted files
	"volumes": [
		"secret": {}
	]
}

How these secrets are configured on the pod (as a file, env var, etc.), and which Keycloak groups have access to these secrests (if we want to add some basic "role-based access"), can be configured in the nebari-config.yaml.

Something like this:

secrets:
- name: <my-secret>
  type: file
  keycloak_group_access:
  - admin
- name: <my-second_secret>
  type: image_pull_secret
  ...

To accomplish this, we will need to add another callable that is used in the `c.kube_spawner_overrides in 03-profile.py:render_profiles.

Alternatives or approaches considered (if any)

There are many specifics that can be modified, such as how users are granted access or how the secrets that are consumed by the deployment.

As for a different usage of SOPS, I can think of one more. That would be to create the kubernetes secret from the encrypted file directly and then have the users decrypt the secret in their JupyterLab pod. This would eliminate the need for the sops-terraform-provider above.

It might be possible to create tiered- secret files that are then associated to the keycloak groups again. This would introduce multiple KMS-keys.

The question that's hard to answer then becomes how to safely and conveniently disperse the KMS key to those who need to access the secrets.

Best practices

User impact

Access to secrets they may need to access job specific resources.

Unresolved questions

Given that SOPS is a GitOps tool, it's important to ensure that admins don't accidentally commit plain text secret files in their repos. Adding more strict filters in the .gitignore will help a little but there's always a chance for mistakes.

RFD - Replace Deployment Github Action Deployment with ArgoCD

Status	Draft
Author(s)	@Adam-D-Lewis
Date Created	2023-09-21
Date Last updated	2023-09-21
Decision deadline	?

Replace Deployment Action with ArgoCD

Summary

For CICD, Nebari supports Github Actions. To support other git platforms (e.g. Gitlab Runners, Azure DevOps, BitBucket Pipelines etc.) we have to port the github action over to the format accepted by those other platforms. Additionally, for private clusters where the nodes themselves are not publicly accessible, Nebari does not have a CICD solution since Github Action runners would not be able to publicly access the k8s cluster directly.

Additionally, if an admin wants to view the state of Nebari, they have to have k8s credentials which requires some manual steps. With ArgoCD, we could solve all the above problems. In this proposal, we deploy ArgoCD alongside the other components of Nebari via nebari deploy. Then when changes are made to the deployment repo, ArgoCD will discover the changes and update the cluster configuration. Because ArgoCD uses a pull based approach rather than a push based approach, a private cluster will still be able to be managed via a GitOps approach.

ArgoCD also provides a dashboard of the running resources where users could view pod logs and we could even give admins/developers the ability to modify k8s spec on the fly as admins and developers sometimes do during debugging in k9s currently eliminating the manual steps required by k9s mentioned earlier We can manage these permissions in Keycloak.

Other Benefits

Using ArgoCD also allows us to scope permissions more precisely. Currently, Github Actions has permission to modify the entire K8s cluster. With ArgoCD, we can scope permissions so ArgoCD can modify only specific K8s namespaces. ArgoCD is commonly used to manage environment promotion (dev -> UAT -> prod) (reference1, reference2) and this could possibly be our solution to doing this in Nebari as well.

User benefit

Design Proposal

We could follow the format of this repo and described in a recent CNCF talk though there are many articles about ways to integrate terraform and ArgoCD.

Alternatives or approaches considered (if any)

Best practices

User impact

Unresolved questions

RFD - Support gitops staging/development/production deployments

Status	Declined ❌
Author(s)	@costrouc
Date Created	12/08/2023
Date Last updated	12/08/2023
Decision deadline	30/08/2023

Title

Support gitops staging/development/production deployments

Summary

This RFD is constructed from issue nebari-dev/nebari#924. We need to have the ability to easily deploy several nebari clusters to respresent dev/staging/production etc within a gitops model. Whatever solution we adopt it should be backwards compatible and easy to adopt.

User benefit

There are several benifits:

testing changed before forcing them on users
cost savings since it might be possible to deploy on the same kubernetes cluster
for larger enterprise customers this is a must have

Design Proposal

I propose using folders for the different nebari deployments. The current folder structure is:

.github/workflows/nebari-ops.yaml
stages/...
nebari-config.yaml

For backwards compatibility we keep this structure and add new namespaced ones based on the filename extension.

For example nebari-config.dev.yaml would imply the following files are written

.github/workflows/nebari-ops.dev.yaml
dev/stages/...

The github/gitlab workflows will be templated to watch and trigger only on updates to the specific files for that environment. This approach is independent of git branching.

Alternatives or approaches considered (if any)

separate branch for production/dev/staging. I think this approach in my mind is the strongest contender. However, I strongly oppose this approach. Often times dev/prod/staging have different configuration intentionally e.g. dev would have smaller node groups etc. Thus dev -> staging -> prod is not always how changes flow. It is also hard to compare production vs. dev side by side without diffs.
- discussions: reddit, codefresh, and many more.
separate repository per deployment. This is possible as is but as with our current experience this is not easy to manage

Best practices

This would provide an easy way for users to have different deployments on the same git repository.

User impact

This change would not affect any existing nebari deployments as far as I am aware and would be backwards compatible.

Unresolved questions

Gitlab doesn't support multiple files for CI it wants a single entrypoint .gitlab-ci.yml. Pipelines would allow us to do this but then the separate stages will have to all write to the same gitlab-ci.yml file. This is solvable.

[DOC] Revisit CoC

Since we are moving to a more community-driven project, we should revisit the CoC

RFD - Backup and restore

Status	Open for comments 💬
Author(s)	@pt247
Date Created	20-04-2024
Date Last updated	05-05-2024
Decision deadline	22-05-2024

Backup and restore - RFD

Backup and restore - RFD

A design proposal for Backup and Restore service in Nebari.

Summary

As Nebari becomes more popular, it's essential to have dependable backup and restore capabilities. Automated scheduled backups are also necessary. Planning is vital since Nebari has several components, including conda-store environments, user profiles in KeyCloak, and user data in the NFS store.

User benefit

Nebari admins will get a straightforward backup process using Nebari CLI.
Admins will also be able to define a schedule for automated backup in Nebari config.
Nebari upgrades can automatically save the state before providing upgrades.
User data and other Nebari components can better protect against accidental deletion.

Design considerations:

We need to look at the development, maintenance, administration, and support requirements to decide on an appropriate strategy for this service. Following is a list of key criteria for the service:

Availability: Service disruption to perform backup or restore.
Observability: Visibility of progress, error, and status.
Maintainability: Ease of building, maintaining and supporting.
Composability: Backup and restore in small chunks independently.
Security: Access control to the backup and restore service and the backup itself.
Compatibility: Forward and, if possible, backwards.
Flexibility: multiple entry points to the backup and restore, e.g. scheduled API
Scalability: Scalability should scaled to large deployments.
Feasibility: developing, maintaining, or computing resources.
Compliance: with various data protection regulations.
On-prem: On-prem and air-gapped deployments.

Data protection considerations:

Encryption at rest and in transit: We must have data encrypted in motion and at rest to protect against unauthorized access.
Backup location: Several data protection directives in the US and EU limit where we can store certain data assets. We should design the backup and restore solution with this in mind.
Day zero feature: Encryption at rest and transit needs to be available in the first version of the backup and restore service.
PoLP (principle of least privilege): Only authorized users should be able to access the backup and restore service.

In the scope of this RFD:

This Request for Discussion (RFD) aims to establish a high-level strategy for backup and restoration. The goal is to reach a consensus on design choices, API, and a development plan for the backup and restoration of individual components. The implementation details of the identified design will be part of another RFD. The focus of this RFD is to develop a backup and restoration strategy for the following components:

Nebari config
Keycloak
Conda-store
User data in NFS

Out of scope for this RFD:

Following Nebari components are not covered in this document.

Nebari plugins
Loki Logs + prometheus
Nebari migration (for, e.g. from AWS to GCP)
Custom backup schedules (e.g. component specific backup schedules)

Existing backup process

You can find the existing docs for backup on this page.

Backup and Restore strategies

There are several approaches to Nebari backup and restore. Some are closer to the current backup and restore, and some are entirely novel approaches. Each of these methods has its own set of advantages and disadvantages. In this section, we will summarise the various approaches suggested in the comments, outline the pros and cons, and briefly describe the implementation.

Backup and restore by component Approach #1

flowchart TD
    Backup --> Storage
    Nebari --> |1. config| Backup
    Nebari --> |2. Keycloak | Backup
    Nebari --> |3. Conda Store | Backup
    Nebari --> |3. User Data | Backup
    
    Storage --> Restore1
    Restore1 --> |1. config| Nebari1
    Restore1 --> |2. Keycloak | Nebari1
    Restore1 --> |3. Conda Store | Nebari1
    Restore1 --> |3. User Data | Nebari1

This approach aims to automate the current manual backup and restore process.

A typical Nebari deployment consists of several components like Keycloak, conda-store, user data and more.

Example Backup flow:

flowchart TD
    A1[CLI] --> B(Backup workflow)
    A2[Nebari config.backup.schedule] --> B
    A3[Argo workflows UI]  --> B
    B --> F(Backup Nebari config)
    F --> D(Backup Keycloak)
    D --> C(Backup NFS)
    D --> E(Backup Conda Store)
    C --> X(Backup Location)
    D --> X
    E --> X
    F --> X

Example Restore flow:

flowchart TD
    A[Nebari Restore CLI - Specified backup] --> B(Backup workflow - latest backup)
    A1[Argo Workflows UI] --> B
    B --> B1(Restore workflow - Specified backup)
    B1 --> F(Restore Nebari config)
    F --> D(Restore Keycloak)
    D --> C(Restore NFS)
    D --> E(Restore Conda Store)
    C --> Z(Validate restore completion)
    D --> Z
    E --> Z
    Z --> |failure| X(Restore workflow - latest backup)
    Z --> |success| Y(Stop)
    X --> |success| Y
    X --> |failure| Y

Note: Both these workflows are, for example, and must be refined/refactored.

Let's look at the pros and cons of this approach:

Pros

Feasibility: We can use tried and tested tools for database dump or Restic to sync files between source and destination.
Maintainability: Development of each rach task (say backup conda-store) can happen separately and iteratively.
Compatibility: Excellent support is available for tried and tested production-ready tools like pg_dump, Restic, rsync and more. This design can use Nebari component agnostic tools, which means the same solution could work for multiple versions of Nebari, providing backwards and forward compatibility.

Cons

Observability: If the backup fails because of a single failed sub-task, it can result in a whole backup or restore failure. This solution offers little Observability.
Composability: The success of individual tasks does not guarantee success of overall success. For example, the solution might find a new user's data for backup without the user being there when Keykloak backs up.
Scalability: As user data increases, this design might need to evolve to take incremental snapshots. If the time it takes to back up increases, so do the chances of Nebari state changing.
Availability: The solution must implement a maintenance window for the entire Nebari during backup and restore processes.

Finer details

Backup location: This design assumes Nebari has read-write access to the backup location. Nebari will manage the backup location.
Local backup: If the backup location is a local directory, the client should have access to read-write to that directory.

Vertical slices per user migration Approach #2

We could look at nebari from the perspective of the user. Each user has some shared and dedicated state in each Nebari component.

Nebari	Shared	Dedicated
Keycloak	Groups, Roles, permissions	User profiles
Conda store	Shared environments	User environments
JupyterHub	Shared user data	Dedicated user data

The solution recommends backing or restoring shared resources first. We can then backup/restore users in parallel or any order.

User migration workflow

flowchart LR
    rc[Restore user] --> rs[Restore shared state] --> ru[Restore user]
    s[Storage] -.-> rs
    s -.-> ru
    
    bc[Backup user] --> bs[Migrate shared state] --> bu[Migrate user]
    bu -.-> Storage
    bs -.-> Storage

Nebari migration overall
Backup flowchart

flowchart LR
    nb[Nebari Backup] ==> rs[Backup shared state] 
    rs ==> bu1[Backup user A] & bu2[Backup user B] & bu3[Backup user C] -.-> Storage
    rs --> | ... | Storage
    rs --> |Backup user n| Storage

Restore flowchart

flowchart LR
    nr[Nebari Restore] ==> rsr[Restore shared state]
    Storage -.-> ru1[Backup user A] & ru2[Backup user B] & ru3[Backup user C] & ru4[...] & ru5[Backup user N]
    rsr ==> ru1 & ru2 & ru3 & ru4 & ru5
    Storage -.-> rsr

Let's look at the pros and cons of this approach:

Pros

Fail fast approach: If all goes well, we will have backed up all users. If not, then there will be two possibilities:
1. Shared state backup/restore failure - which will be immediate or
2. Single User state backup/restore failure - which can be local to a particular user, e.g. another user's backup/restore might still succeed. The likelihood of failure significantly decreases after the initial few users have been successfully migrated.
Composable: admins can migrate a single user at a time or a batch of users simultaneously. We could easily create workflows to the backup shared state at a higher or lower cadence than users.
Monitoring: The design allows for more granular status and progress monitoring.
Availability: Backup of a user should not impact service for other users.

Cons

Maintainability and Compatibility: This design depends on the APIs present in the individual components and our understanding of the state and data within them. However, our understanding and, therefore, the implementation may need to be updated with version upgrades. Hence, this backup/restore solution is only compatible with a limited number of component versions.
Feasibility: This design, although more evolved, will also require more building, maintenance, and support.

Restful Interface Approach #3

The last two designs include the backup and restore functionality in Nebari. The central assumption was that Nebari should be able to back up and restore itself. However, thanks to helpful comments in this RFD, this design challenges this premise and proposes an alternative solution.

This design breaks the implementation into two: the interface and the strategy. It argues that Nebari should only provide the interface for importing/exporting data. The backup and restore strategy should be part of the client code. We can extend the interface by providing a Python library.

block-beta
columns 1
    j["Client Script (Backup strategy maintained by Nebari Admin)"]
    blockArrowId6<["&nbsp;&nbsp;&nbsp;"]>(updown)
    L["Nebari backup and restore library (Python package)"]
    blockArrowId7<["&nbsp;&nbsp;&nbsp;"]>(updown)
    D["Nebari Backup and restore REST API"]
    blockArrowId6<["&nbsp;&nbsp;&nbsp;"]>(updown)
  block:ID
    A["Conda Store REST API"]
    B["User DATA REST API"]
    B2["Keycloak REST API"]
  end

The idea is simple: instead of building a backup and restoring service, we could build a backup and restore interface. The only job of this interface will be to provide users' state and data to authenticated users outside Nebari. The entire backup and restore
logic can be built and maintained outside Nebari. This backup and restore client can then be run from anywhere, providing Admins with flexibility that other designs do not offer.

flowchart LR
    subgraph Backup and restore library
        Client
    end
    Client-->I
    subgraph Nebari
        I
    I[Backup and restore interface API]-->K[Keycloak API]
    I-->C[Conda store API]
    I-->J[JupiterHub API]
    I-->N[User data API]
    end

Serializable vs Non-Serializable data

An essential requirement for this design is to expose data and state. APIs like Keycloak and conda-store API already provide the bulk of serializable states. However, not all states are serializable, e.g., user data and conda packages. In this case, the design recommends APIs to download location URLs. APIs in Nebari could be completely stateless.

Let's look at a few transactions with this proposed API.

Serializable data

sequenceDiagram
    Client->>API: GET /users
    API-->>Keycloak: GET  /admin/realms/{realm}/users/
    Keycloak-->>API: [A, B, C]
    API-->>Client: [A, B, C]

Non-Serializable data

sequenceDiagram
    Client->>API: GET /users/A/environments
    API-->>conda-store: GET /api/v1/environment/?namespace={..}
    conda-store-->>API: [E1, E2, E3 ...]
    API->>Client: [{envs:[E1, E2]}]
    Client-->NFS: FTP/Rsync/Restic FETCH Artifact from E1, E2, E3 ...

Let's see the pros and cons of this design.

Pros

Flexibility: Nebari admins can write their custom backup strategy based on the organization's needs.
Observability: This solution gives Nebari Admins a unique insight into the inner workings of Nebari. It makes it less opaque and, thus more
Compliance: The authenticated admins are responsible for enforcing compliance with company policies.
Availability: If the client interface is well-developed, this solution can achieve the highest level of availability.
Clear division of responsibility: Responsibilities of Component API, Nebari backup, and restore API, client library, and client code.

Cons

Maintainability & Support: This approach moves the complexity of the backup/restore strategy outside Nebari. It now requires Nebari admins to know and understand the inner workings of Nebari.
Flexibility: A misconfigured client script can wreak havoc with the Nebari ecosystem.

Design Discussion

Possible options

Each of the above-discussed designs has its pros and cons. We could also extend the
designs.For example,we could extend Approach#2 and#1 via an API toprovide simple
interfaces like/users/{uid}/backup/keycloak.

Let's look at a few possible options we can vote on. More suggestions welcome.

Option#1: Start with Restful Approach #3 to enable power users
in the first iteration. Then extend this to
Sliced Approach #2 for normal users.
Option#2: Implement Bulk Backup Approach #1 in the first
iteration evolve it to Sliced Approach #2
by exposing an API in second iteration.
Option#3: Implement Bulk Backup Approach #1.
Option#4: Implement Sliced Approach #2.
Option#5: Implement Restful Approach #3.

Special note about conda-store

Conda store is one of the more complicated pieces to replicate among the Nebari
components. We will need to work with conda-store team to come up with a detailed plan
on backup-restore. But, here is a initial analysis based on conda-store docs.

The S3 server is used to store all build artifacts for example logs, docker layers,
and the Conda-Pack tarball. The PostgreSQL database is used for storing all states
on environments and builds along with powering the conda-store web server UI, REST
API, and Docker registry. Redis is used for keeping track of task state and results
along with enabling locks and realtime streaming of logs.

The simplest approach (Compatible with Approach #1)

Backup the object storage and dump the database. Restore would be reverse. We might
have to ensure that database location entries for artifacts and Conda-pack are pointing
to the right location. This might involve simple find and replace operations to the
SQL dump.

Approach per user (Can be used in Approach #2 and Restful Approach #3)

Getting the shared state:
- Get and SQLDump of entire conda-store
- Mark all entries in environment as deleted by setting deleted_on field.
- Get global name spaces, for each
  - get related environments and reset deleted_on to make them available.
    - for each environment
      - Get build artifacts to backup environment -> build -> build_conda_package_build -> conda_package_build
      - Backup artifacts from source.
- Getting the user state:
  - Same as getting the shared state except the namespace will be of the given user.

Please note:

We need to get conda-store team to review this. But it gives a general idea
Most of this flow can be done via API except changing environments delete status.
We will need to create as separate RFD for Conda store.

Relevant links:

Unresolved questions:

Which design is most suitable?
Is there a hybrid design that we can develop iteratively?

RFD - User Friendly Method for Jupyter users to run an Argo Workflow [Draft]

Status	Draft 🚧
Author(s)	Adam-D-Lewis
Date Created	02-03-2023
Date Last updated	02-03-2023
Decision deadline	?

This is very much a Draft but I welcome feedback already if you want.

User Friendly Method for Jupyter users to run an Argo Workflow (Draft)

Summary

The current method of running Argo workflows from with Jupyterlab is not particularly user friendly. We'd like to have a beginner friendly way of running simple Argo Workflows even if this method has limitations making it not appropriate for more complex/large workflows.

User benefit

Many users have asked for ways to run/schedule workflows. This would fill many of those needs.

Design Proposal

Users would need to create a conda environment (or add a new default base environment - argo_workflows) that has python, python-kubernetes, argo-workflows, and hera-workflows packages.
We pass in some needed pod spec (image, container, initContainers, volumes, securityContext) into the pod as environment variables. We do this via a KubeSpawner traitlet.
Enable --auth-mode=client on Argo Workflows in addition to --auth-mode=sso. Then when users log in, kubespawner should map them to a service account consistent with their argo permissions, and set auto_mount_service_token to True in kubespawner as well. Example according to chatgpt is below though idk if it's hallucinating. Details around authentication via Jupyter vs Keycloak is still a bit hazy to me.

from kubespawner import KubeSpawner
import json

class MySpawner(KubeSpawner):
    def pre_spawn_start(self, user, spawner_options):
        # Get the JWT token from the authentication server
        token = self.user_options.get('token', {}).get('id_token', '')

        # Decode the JWT token to obtain the OIDC claims
        decoded_token = json.loads(self.api.jwt.decode(token)['payload'])

        # Extract the OIDC groups from the claims
        groups = decoded_token.get('groups', [])

        # Modify the notebook server configuration based on the OIDC groups
        if 'group1' in groups:
            self.user_options['profile'] = 'group1_profile'

        # Call the parent pre_spawn_start method to perform any additional modifications
        super().pre_spawn_start(user, spawner_options)

Users with permissions can then submit argo workflows since /var/run/secrets/kubernetes.io/serviceaccount/token has the token to be able to submit workflows.
Write a new library (nebari_workflows) with usage like

import nebari_workflows as wf
from wf.hera import Task, Workflow, set_global_host, set_global_token, set_global_verify_ssl, GlobalConfig, get_global_verify_ssl

# maybe make a widget like the dask cluster one
wf.settings(
  conda_environment='',  # uses same as user submitting it by default
  instance_type='',  # uses same as user submitting it by default
)

with Workflow("two-tasks") as w:  # this uses a service with the global token and host
    Task("a", p, [{"m": "hello"}], node_selectors={"beta.kubernetes.io/instance-type": "n1-standard-4"})
    Task("b", p, [{"m": "hello"}], node_selectors={"beta.kubernetes.io/instance-type": "n1-standard-8"})

wf.submit(w)

Alternatives or approaches considered (if any)

Here

Best practices

User impact

Unresolved questions

Here's what I've done so far

Created a conda environment that has python, python-kubernetes, argo-workflows, and hera-workflows packages.
Added a role (get pod permissions) and role binding to the default service account in dev
Change instance type profile to automount credentials for all users so they get the get_pod permissions
Copied all the image, container, initContainers, volumes, securityContext in 2 places, resources, and HOME env var from the pod spec and put them in an argo workflow (think jinja to insert them in the right places)
Copied the ARGO_TOKEN and other env vars from Argo Server UI and sourced them in a jupyterlab terminal.
Ran a short script using argo_workflows python API to submit the workflow. It has access to the user conda environments conda run -n myEnv and all the user directory and shared directories.
1. the process started in / instead of at HOME, not sure why yet
2. I ran ["conda", "run", "-n", "nebari-git-dask", "python", "/home/ad/dask_version.py"]
3. I read and wrote a file to the user's home directory successfully

So deviations from that are still untested.

RFD - Bitnami retention policy considerations

Status	Draft 🚧
Author(s)	@viniciusdc
Date Created	08-12-2022
Date Last updated	08-12-2022
Decision deadline	NA

Title

Considerations around Bitnami retention policies

Summary

Just a note regarding using Bitnami as the repo source for helm charts, as happened in the past with Minio, they have a 6m retention policy for their repo index, which means that old versions will be dropped from the main index after that period. This is, in the future, our deployments are bound to break if the version is not found by Helm.

User benefit

Right now, we are bound to have broken deployments of the old Nebari versions in the future; as an example, v0.4.0 and v0.4.1 are still (at this date) broken due to the fact these versions have in their source code a pointer to a Minio version that does not exist anymore in the main index.yaml (fixed on v0.4.2).

Design Proposal

Alternatives or approaches considered (if any)

There are some ways to suppress this problem, each one with their pros x cons:

Every six months, we update our chart versions or validate these somehow, originally proposed here
We pin the repository source on each service with the last available hash for the index, the same as we did for Minio
Increase the release schedule to have monthly releases...

Best practices

User impact

Unresolved questions

[WIP] Update "Team" on GitHub

Update the team structure and verify permissions for each team:

Note: The following plan is tentative to get us started, and will be updated after further discussion.

References:

cc @trallard

RFD - Managing Nebari dependencies

Status	Open for comments 💬
Author(s)	@iameskild
Date Created	2022-11-28
Date Last updated	2023-03-15
Decision deadline	---

Managing Nebari dependencies

Summary

Pin all the things

Let me start by stating that Nebari is not your typical Python package. For Python packages that are intended to be installed alongside other packages, pinning all of your dependencies will likely cause version conflicts and result in failed environment builds.

Nebari on the other hand is a package that is responsible for managing your infrastructure and the last thing you want is for the packages that Nebari relies on to introduce breaking changes. This has happened now twice this week alone (the week of 2023-01-16, issue 1622 and issue 1623).

As part of this RFD, I propose pinning all packages that Nebari requires or uses. This includes the following:

Python package dependencies set in pyproject.toml and making sure the package can be built on conda-forge
Maximum acceptable Kubernetes version
Terraform provider versions
- Already pinned (see next header for proposal to central these as much as possible.)
Docker image tags (used by Nebari services not the images in nebari-docker-images repo)
Helm chart release versions

Set pinned dependencies used by Terraform in `constants.py`

In Nebari, the Python code is used primarily to pass the input variables to the Terraform scripts. As such, I propose that any of the pinned versions - be they pinned Terraform providers, image/tags combinations, etc. - used by Terraform be set somewhere in the Python code and then passed to Terraform.

As an example, I recently did this with the version of Traefik we use:

https://github.com/nebari-dev/nebari/blob/bd777e6448b5e2d6339bc3d9ef35672163ae1945/nebari/constants.py#L4

Which is then used as input for this Terraform variable:

https://github.com/nebari-dev/nebari/blob/bd777e6448b5e2d6339bc3d9ef35672163ae1945/nebari/template/stages/04-kubernetes-ingress/variables.tf#L19-L25

https://github.com/nebari-dev/nebari/blob/bd777e6448b5e2d6339bc3d9ef35672163ae1945/nebari/template/stages/04-kubernetes-ingress/modules/kubernetes/ingress/main.tf#L215

Regularly review and upgrade dependencies

Once packages start getting pinned, it's important to regularly review and upgrade these dependencies in order to keep up-to-date with upstream changes. We have already discussed the important of testing these dependencies and I believe we should continue with that work (See issue 1339).

~~As part of this RFD, I propose we review, test and upgrade our dependencies once per quarter as part of our release process.~~

Although we may not need to update each dependency for every release, we might want to consider updating dependencies in a staggered fashion.

For release X: update all Python dependencies in the pyproject.toml and ensure that the package is buildable on conda-forge.
For release X+1: update the maximum Kubernetes version and any Helm charts
For release X+2: update Terraform provider versions
... and repeat

We don't necessarily need to make the update process this rigid but the idea is to update a few things at a time and ensure that nothing breaks. And if things do break, fix them promptly to avoid running into situations where we are forced to make last-minute fixes.

User benefit

In my opinion, there are a few benefits to this approach:

Increased platform stability; running Nebari version X will work the day it was released and in two years from now.
Instead of having pinned versions scattered through the Terraform scripts, we can centralize their location. This makes it easier to quickly check what version of what is being used.
This can be the start of dependency tracking. With a centralized location for all pinned dependencies, we can more easily write a script that updates and tests these dependencies.

Design Proposal

The design proposal is fairly straightforward, simply move the pinned version of the Terraform provider or image-tag used to the constants.py. This would likely require an additional input variable (as demonstrated by the Traefik example above).

User impact

We can be sure that when we perform our release testing and cut a release, that version will be stable from then on out. This is currently NOT the case.

What to do with the other Nebari repos?

This RFD is mostly concerned with the main nebari package and doesn't really cover how we should handle:

nebari-docker-images
nebari-jupyterhub-theme

I think these are less of a concern for us since once the nebari-jupyterhub-theme is included in the Nebari Jupyterhub Docker image, and once the images are built they don't change, there is little chance that users will be negatively affected by dependency updates. The only exception would be if users pull the image tag main which is updated with every new merge into nebari-docker-images - this does not follow best-practices and we will continue to advise against it.

Unresolved questions

I still need to test if this is possible for the pinned version of a particular Terraform provider used, such as:
https://github.com/nebari-dev/nebari/blob/bd777e6448b5e2d6339bc3d9ef35672163ae1945/nebari/template/stages/04-kubernetes-ingress/versions.tf#L1-L13

Tried this recently and from what I can tell this is not possible (at least not without relying on some kind of templating). Therefore, Terraform provider versions will need to be set directly in their respective required_provider block (usually in the version.tf file).
- This might be possible with a tool like tfupdate.

Bi-weekly community meeting for Nebari

Context

As we adopt a community-first approach to Nebari development, it will be nice to open our team syncs to everyone (which are currently internal to Quansight).

Proposal:

Timing: Every other Tuesday, 3:30-4pm GMT
Notes, options:
- Open an issue (with a dedicated label) against this repo?
- Open google/notion/hackmd doc?

Value and/or benefit

...

Anything else?

No response

[ENH] - Adopt Code of conduct and enforcement procedures

We want to foster an inclusive, supportive and safe environment for all our community members. Need to adopt a CoC with the following:

1. Explicit: acceptable and unacceptable behaviour
2. Scope: where is this applicable
3. Enforcement
4. Reporting
5. Social rules
6. Other items that might not fit or are borderline CoC
7. CoC response protocol

RFD - Move Nebari infrastructure code from HCL to python using terraformpy

Status	Open for comments 💬
Author(s)	@viniciusdc
Date Created	13-03-2023
Date Last updated	13-03-2023
Decision deadline	--/--/--

Summary

Nebari heavily depends on terraform to handle all of our IaC needs. While HCL (the .tf files) is a great language for describing infrastructure, it is not the best language for writing code where multiple ecosystems are involved. We can see such cases where adding a simple new feature requires us to sometimes re-write the same piece of code multiple times in HCL (e.g the variables that are used across different modules)

Our main code that handles most of the execution of the terraform binaries is already written in python (a subprocess is responsible to run terraform plan and terraform apply), as well as almost all of our interactions within the already deployed cluster during testing is also done in python. Due the complexity of our ecosystem having such situations where we need to write a lot of HCL code to handle the edge cases that we have is not only time consuming but also error prone. In this RFD I would like to suggest moving our infrastructure code to python using terraformpy to make it easier to maintain and extend.

Benefit

There are multiple benefits to this change:

Easier to maintain and extend the codebase as we can use the full power of python to write the code
Easier to test the code as the function would then be easier to import and test
Python would grant us more flexibility when adding new features as we would then be able to point to a terraform resource as an object and then call its methods to do the required changes (no need for extra variables and output to move data around)
Parsing our code-base would be easier.
- As a quick example on how that would benefit use: Right now all helm charts are using the helm provider for quick deployment, which is wonderful for the deploy perspective... though linting the files and keeping track of the version updates is really complex as we would require inspecting all files in the repo tree and use some regex to identify the charts. If we move to python we can import the helm provider and then call its methods to get the list of charts and their versions (or save then under a list to be exported somewhere else). Which would make it easier to keep track of their versions and also to update them in other tests (e.g., the upgrade test -- eval broken Bitnami charts --)
Easier to get people onboarded to the project as they would not need to learn HCL to contribute to the project.

Drawbacks

We would need to rewrite all of our codebase to python using the terraformpy library
Requires some time to get used to the new syntax
Re-think how we would call each stage of the deployment (though I think this migration would be not as terrible)
The terraformpy library has few updates in the last two years, but it is still maintained.

Approaches considered (if any)

Right now to write a simple new variable, we need to do something like this:

# in the variables.tf file in the main.tf root directory
variable "my_var" {
  type = string
  default = "my_value"
}
-----------------------
# in the main.tf file in the main.tf root directory
module "my_module" {
  source = "./my_module"
  my_var = var.my_var
}
-----------------------
# in the main.tf file in the my_module directory
variable "my_var" {
  type = string
}
# in the variables.tf file in the my_module directory
variable "my_var" {
  type = string
}

And we also need to make sure we are passing it over to input_vars.py. This is a lot of code to write for a simple variable that we need to pass over to a module. (image when we need to pass outputs to different stages)

With python we would instead have a function that received the vars as input and passes it over to the correct module under its hood. This would make the code much easier to maintain and extend. For example:

from terraformpy import Module
from .vars import my_var

def pass_vars_to_module(my_var):
    Module(
        source="./my_module",
        my_var=my_var
)

That's it, of course, this example is very simple and do not take in consideration the full complexity of the codebase, but I think it would be a good starting point to see how we can simplify the codebase.

User impact

The user would not see any changes in the way they usually interact with the project, though this would be a breaking change for the project as we would need to rewrite all of the codebase to python.
Our CI tests would be more reliable as we could test the code in a more isolated way.

Unresolved questions

[DOC] - Remove `Releases.md`

Preliminary Checks

This issue is not a question, feature request, RFC, or anything other than a bug report. Please post those things in GitHub Discussions: https://github.com/nebari-dev/nebari/discussions

Summary

We have a new release process documented here: https://www.nebari.dev/docs/community/maintainers/release-process-branching-strategy

We can have the documentation page (at nebari.dev) as the source of truth and move undocumented details from the governance repo to the official docs. :)

Steps to Resolve this Issue

...

[DOC] - Analytics with Plausible

Preliminary Checks

This issue is not a question, feature request, RFC, or anything other than a bug report. Please post those things in GitHub Discussions: https://github.com/nebari-dev/nebari/discussions

Summary

We can document how to access Plausible analytics for Nebari.

Steps to Resolve this Issue

Shall we make the analytics public?
If yes, we can share a link to the public site in the readme (and/or documentation)
If not, we can share a way for community members to gain access to the analytics. Example: Open an issues/discussion or send an email to <> to gain access to analytics.

[ENH] - Add revised release documentation

Should have docs in this repo
Add a checklist to the repo
Details on release cadence, versioning, etc.

[ENH] - Add peribolos to the organisation to handle GH teams and permissions

Right now we are handling permissions by hand - would be ideal to have a declarative way to handle this

Proposal: adopt Peribolos

RFD - Vault for Deployment and Dynamic User Secrets

Status	Open for comments 💬
Author(s)	costrouc
Date Created	14-02-2022
Date Last updated	14-02-2022
Decision deadline	14-02-2022

Vault for Deployment and Dynamic User Secrets

Summary

Have spent around 2 days familiarizing myself with Vault and trying it out by using Hashicorp's managed vault deployment. We have the features that we would need to allow:

storing deployment secrets
allowing users/groups to create their own secrets which they can then update/delete and share with other services all with a rich permissions model

User benefit

There are two users I have in mind this this proposal end users e.g. regular users/developers on Nebari and devops/it sysadmins managing the deployment of Nebari. This proposal would satisfy both of these.

Design Proposal

Implementation

would be nice to have monitoring (prometheus/grafana) in a separate stage since vault can export metrics to prometheus (optional)
2 new stages before 07-kubernetes-services
- vault deployment via helm
- configure vault
after this is done we store all the secrets created during the deployment in vault using kubernetes authentication via a service account we created during the deployment.

Notice user does not have to store/remember any secrets!

How would we configure vault:

auth provider:
- kubernetes auth using kubernetes service accounts (for deployments and services)
- oidc auth using keycloak (for users)
policies would be:
- created for users/groups to have paths e.g. users/<username>/* and group/<group>/* to write arbitrary secrets
- created for services to that the given service only has access to those specific secrets
how to mount the secrets. vault has several options (sidecar looks the most promising since it allows for dynamically updating secrets) https://www.hashicorp.com/blog/injecting-vault-secrets-into-kubernetes-pods-via-a-sidecar

# patch-basic-annotations.yaml
spec:
  template:
    metadata:
      annotations:
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/agent-inject-secret-helloworld: "secrets/helloworld"
        vault.hashicorp.com/role: "myapp"

Kubernetes service accounts are at the heart of this. Would would assign identities to users/services via attaching a service account e.g. <namespace>/service-<service-name> or <namespace>/user-<username>

Alternatives or approaches considered (if any)

There is currently a proposal for using SOPS for secret management #29.

SOPS to me has several downsides
- yet another place to store the secrets
- managing private keys to encrypt/decrypt secrets
- support for multiple public/private keys?
- does not address the regular user use case of wanting dynamic secrets on the cluster
- additional things to manage in the nebari github repo
- dangers of commiting secrets to repo unencrypted
Advantages to me:
- significantly simpler to deploy

Best practices

User impact

Unresolved questions

Do we use separate namespaces for users?

[DOC] - Decision-making process

Preliminary Checks

This issue is not a question, feature request, RFC, or anything other than a bug report. Please post those things in GitHub Discussions: https://github.com/nebari-dev/nebari/discussions

Summary

Create guidelines on how we make decisions as a team, including:

How/when to open RFD issues, and what should the deadline be?
What are the expectations for discussions?
How to build consensus?
When is an RFD accepted?

Steps to Resolve this Issue

TBD

RFD - Allow users to customize and use their own images

Status	Open for comments 💬
Author(s)	@iameskild
Date Created	2022-11-22
Date Last updated	--
Decision deadline	--

Allow users to customize and use their own images

Summary

At present, this repo builds and pushes standard images for JupyterHub, JupyterLab and Dask-Workers. These images are the default used by all Nebari deployments.

However, many users have expressed an interest in adding customize packages (conda, apt or otherwise) to their default JupyterLab image and doing so at the moment is not really feasible (at least not without a decent amount of extra leg work). To accommodate users, we have often simply resorted to adding their preferred package to these default images. This solution is not scalable.

User benefit

By giving Nebari users the ability to customize these images, we greatly open up what is possible for them. This will give users further control over what packages get installed and how they use and interact with their Nebari cluster.

I have already heard from a decent number users that this would be a much-appreciated feature.

Design Proposal

Ultimately, we want to allow users to add whatever packages (and possibly other configuration changes) they want to their JupyterHub, JupyterLab, and Dask-Worker images. We also want to make this process as simple and straightforward as possible.

Users should NOT need to know:

how to write a Dockerfile
how to use docker or build images
how to push or pull from a registry

In the nebari code base we already have a way of generating gitops and nebari-linter workflows for GitHub-Actions and GitLab-CI (for clusters that leverage ci_cd redeployments). We currently do this by building up these workflows from basic pydantic classes that were modeled off of the JSON schema for GitHub-Actions workflows and GitLab-CI pipelines respectively.

Why not do the same thing for building and pushing docker images?

With some additional work, we can render a build-push workflow (or pipeline) that can do just that. This proposed build-push workflow would look something like:

Using the existing default Nebari docker image as a base image, add user specific packages.
- users might add/remove packages to an environment.yaml, apt.txt, etc. that resides in an images folder in their repo.*
Use the docker/build-push-action (or similar for GitLab-CI) to build and push images to GHCR (or similar for GitLab-CI)

This new workflow would live in the same repo that the deployment resides in so there is no need for managing multiple repos.

As I currently see it, this would require:

an added section to the nebari-config.yaml (perhaps under the ci_cd section) that can be used as a trigger to render this new workflow file
a way to render this new build-push workflow file
- as mentioned above, this can be completed in a similar manner to how we render gitops or nebari-linter
a Dockerfile template for each image (JHub, JLab, Dask)
- that pulls a base image from quay.io/nebari
a template folder (images) that contains an environment.yaml, apt.txt, etc.

Alternatives or approaches considered (if any)

Best practices

User impact

No user impact unless they decide to use this feature.

Unresolved questions / other considerations

There are a few other enhancements that we could make to make:

some users may want their images pushed to private registries
allow users to add additional Dockerfile stanzas for even more customization

RFD - Create a Nebari blog

Status	Accepted ✅
Author(s)	@pavithraes
Date Created	13-Sep-2023
Date Last updated	13-Sep-2023
Decision deadline	19-Sep-2023

Create a blog space for Nebari

Summary

Create a new Nebari blog to have a space for community articles, something we don't currently have.
It'll be open to contributions from all community members, and the scope will be everything related to Nebari usage & development.

User benefit

It'll give us a space for communication that doesn't fit in the docs, like introducing and discussing the "why" behind certain features/changes
It'll reinforce the community-first direction of Nebari, because all project-communications will happen in the Nebari domain (previously, Nebari communication was on the Quansight & Labs blogs)
It'll hopefully also attract more contributors :)

Design Proposal

We can use the Blog functionality that Docusaurus provides: https://docusaurus.io/docs/2.2.0/blog

Alternatives or approaches considered (if any)

We can use GitHub discussions, but they're not as searchable as our docs website.

Best practices

N/A

User impact

Since this will be on the documentation side, we'll have a PR that adds the blog section.
The first blog post can be about the new extension mechanism

Unresolved questions

N/A

nebari-dev / governance Goto Github PK

governance's Introduction

Governance

governance's People

Contributors

Watchers

Forkers

governance's Issues

Title

Summary

User benefit

Design Proposal

Plugin Interfaces

Subcommands

Stages

Alternatives or approaches considered (if any)

Best practices

User impact

Unresolved questions

Summary

User benefit

Design Proposal

Current Permissions model in Nebari

JupyterHub

Conda Store

Grafana

Dask

Argo

Problems

Idea

JupyterHub RBAC: brief overview

Nebari RBAC: Proposal

Service

Component

Roles and Scopes

Examples:

1. Create a role such that when attached to a group it will create shared nfs directory in /shared.

2. Roles to control access to conda store namespace

3. App sharing permission

Implementation steps:

Notes

Alternatives or approaches considered (if any)

Best practices

User impact

Unresolved questions

Summary

Design proposal

Stabilizing/backend test

End-to-End testing (User experience)

Teardown

User benefit

Alternatives or approaches considered (if any)

Best practices

User impact

Unresolved questions

Proposal for more explicit nebari-config.yaml

Summary

User benefit

Design Proposal

Alternatives or approaches considered (if any)

Best practices

User impact

Unresolved questions

Items to add:

Make nebari internals aggressively private

Summary

User benefit

Design Proposal

Alternatives or approaches considered (if any)

User impact

Summary

Design Proposal

Workflow

Handling secrets locally

Include these secrets in the Nebari cluster

Including the secrets in the user's environment

Alternatives or approaches considered (if any)

Best practices

User impact

Unresolved questions

1. Create a role such that when attached to a group it will create shared nfs directory in `/shared`.

Make `nebari` internals aggressively private