Coder Social home page Coder Social logo

funcx-container-service's People

Contributors

bengalewsky avatar blue442 avatar elgohr avatar kylechard avatar mtn avatar stevegoldstein avatar tshaffe1 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

funcx-container-service's Issues

Restore CI Workflow

#Problem
The current main branch is no longer building

Approach

It looks like a library is failing safety check. Fix this and insure that docker image is published

Build Container WebApp Endpoint

As a DLHub user I want to submit a specification so my container can be built for me

We will add a new endpoint where a user can provide a Repo2Docker style specification for a new container to be built.

Payload to the new POST endpoint: `/containers/build/:

"type": "docker",
"specification": {
      "apt": ["apt package 1", "apt-package 2"],
      "pip":["pypi library", "pypi library==specific version"],
      "conda": ["conda dependency", "conda dependency==specific version"]
}

Acceptance Criteria

Given I have a valid docker container specification JSON When I submit to the WebService Then I receive a container UUID and a container record is created and the specification is persisted with a status of Submitted.

Assumptions

  1. Does not support advanced build (tarball) containers
  2. type must be "docker"
  3. Add specification property to Container model: JSON container build specification
  4. Add a status property to the Container model. Valid states are "Submitted", "Building", "Build Failed", "Ready"

Reuse identical containers

As a container Service User I want to re-use identical containers so I can conserve resources

Assumptions

  1. FuncX web service will compute a hash of the container spec
  2. The hash will be done after some basic standardizations (sort the lists) - this will increase the likelihood of matching up identical containers
  3. The hash will be saved in the container table
  4. New container requests will have a hash computed. If the hash matches and the build status of that container is ready then just return the container UUID of the previous container request
  5. If no successful match is found, go ahead and pass the request on to the container service.

Acceptance Criteria

Given I have successfully build a container for a spec When I submit a container build request for a functionally identical spec Then the container service is not invoked And I receive the UUID of the previously built container

Given I have successfully build a container for a spec When I submit a container build request for a different spec Then the container service is invoked And I receive a new UUID for the new container

Make python version of container explicit

Problem

It's hard to specify the python version of the built container.

You add this to the container spec:

            conda=[
                "python=3.10"
            ]

It would be better if the SDK took care of this and you can externalize the specification

WebApp invokes the Container Service Build

As a DLHub user I want the container service to build a docker image based on my specification so I can find an image to run my model

Acceptance Criteria

Given I have a valid docker container specification JSON When I submit to the WebService Then the container service starts the build and the status of the container record will be Building

Assumptions

  1. Will invoke the container service /build operation. The payload will include the UUID of the container along with the container type (always docker) and the specification.
  2. The WebService will have a /containers/<uuid>/status PUT operation to allow the Container service to update the status
  3. To avoid problems with jobs running in multiple pods there can only be one instance of the container service

Build Dockerfile from Container Service REST endpoint

As a funcX function author I want the container service to build a Dockerfile based on simple specifications

Assumptions

  1. Will support the following repo2docker specifications
  • environment.yaml
  • Pipfile
  • requirements.txt
  • setup.py
  • apt.txt
  • postBuild
  1. Creates Dockerfile only (doesn't build the image)
  2. Dockerfile is returned in the response.
  3. Only repo2docker errors are trapped and reported.

Acceptance Criteria

  1. Given I have a POST message with a requirements.txt body When I submit it to the endpoint then I should see a correct Dockerfile in the response.
  2. When I docker build this Dockerfile I get a valid image
  3. Given I submit an invalid request, then I should receive a useful error message

Isolate Container Service to nodes that don't run any other funcX components

For security purposes we don't want to allow the container service to be able to access any other components in the funcX stack. There is a potential vulnerability due to docker build running as root.

We already have a node tag for the specially prepared nodes that can run container service. Update the helm chart so the other services have a nodeSelector that excludes that tag.

Remove Unused Dependencies from ContainerService

Problem

There is a dangling dependency on SQLAlchemy in requirements.txt - this library now has a vulnerability and is failing the safety check. It's not a good idea to have libraries floating around that we don't need.

Approach

Remove this and see if there are other unused libraries.

Also look at the Dockerfile to see if we need the apt packages that are installed:

RUN apt-get update && \
    apt-get install -y  gcc musl-dev && \
    apt-get install -y  postgresql libffi-dev g++ make git

Asynchronous Container Service

As a funcX function author I want the container service to operate asynchronously so I can get on with my work while the container builds

Assumptions

  1. One or more worker threads to actually do the container build

Acceptance Criteria

  1. The create container operation will return immediately with the UID assigned to the request

Version endpoint

Add a /version endpoint to the service so the WebService can interrogate the version. It will also be used as a healthz check.

Assumptions

  1. Will be a GET on the /version endpoint
  2. Version will be set in a python file version.py
  3. Result will be a JSON document:
{
   version: 1.0
}

Raise exception when user attempts to get status of container not built by container service

Problem

The container build status REST endpoint only makes sense for containers that were built by the container service. Right now if the endpoint is invoked on a manually created container it throws a ContainerNotFound exception which is not exactly correct or easy to make sense of.

Approach

Create a new exception class ContainerStatusNotValid to report this correctly

Allow Conda Channels in ContainerSpec

As a container service user I want to be able to use conda packages that are not available in the default channel so I can get the libraries I need

Assumptions

  1. Add conda_channels property to the container_spec. It will be an optional list of conda channels
  2. Add values from here to the channels property in the generated environment.yaml file

Correctly report container size in bytes as int

Kevin ran across container_size (in funcx-services/web-service/funcx_web_service/schemas/container.py, specified in ContainerBuildStatusUpdate):

class ContainerBuildStatusUpdate(BaseModel):
    ...
    container_size: float = Field(default=0.0, description="Container size in bytes")
    ...

That is, the description lists "bytes" as the unit, but the data type is listed as a float? Does that seem correct to you? I would rather expect that to be a non-negative integer, like:

    container_size: NonNegativeInt = Field(default=0, description="...")

But that matches the spec in the funcX Container Service code, which is, currently:

class CompletionSpec(BaseModel):
    ...
    container_size: float = 0
    ...

Add container service to funcX helm chart

As a funcX developer I want to deploy the container service via helm chart so I can easily deploy the full stack

Acceptance Criteria

  1. There is a boolean value in values.yaml that enables the container service
  2. Setting this value to true and doing a helm install results in ContainerService pod being deployed
  3. No errors in the ContainerService pod log

Assumptions

  1. No Ingres will be provided. The container service will only be accessible inside the cluster or with kubectl port-forward

Notes:
Start by perfecting the deployment of a dev environment following the instructions in the helm chart repo

This should just be new deployment and service templates. Hopefully the service can be further configured with a config file mounted in the pod in the same way

Hello, Container Service

As an author of funcX functions I want there to be a service to build custom containers so I can offer functions that run across compute environments

Acceptance Criteria

  1. New repo funcx_container_service
  2. Basic flask app
  3. Docker file that runs flask app inside a uWSGI container
  4. Initial pytests
  5. Git Actions to test, lint with flake8, and publish docker image
  6. Apache licence
  7. README

Hide Internal FuncX Routes From Internet

Problem

Microservices in funcX may need to communicate with the funcX app, however they do not have any credentials to access the endpoints.

Approach

Make these REST endpoints free from any auth checks and only allow access from inside the Kubernetes cluster.

Update the Ingres to add a /v2/internal route.

      - path: /v2/internal
        pathType: Prefix
        backend:
          service:
            name: {{ .Values.app.ingress.defaultBackend }}
            port: 
              number: 80

This causes any attempts to reach these endpoints to route users to an error page.

Update the app and the container service to use this new route

Container Service Persistence

As a funcX administrator I want the container service to persist container build requests so I can understand the types of containers being used by deployed functions

Assumptions

  1. Will use SQLAlchemy to persist to a relational database
  2. New REST endpoint to retrieve container build results as well as the full docker image reference (my.registry.address:port/repositoryname)
  3. Save specification as well as build metrics (time)
  4. Save a hash of the build specification

Acceptance Criteria

  1. When I submit a container build request I get back a UID for the request
  2. When I request the results for the UID I get back information about the build
  3. When I request a log for the UID I get back the detailed build log
  4. When I submit the same specification a second time, the build is skipped and I'm given a reference to the existing image

Routine Garbage Collection of Build Images

Problem

Docker images are built by the container service and are kept in the local docker image registry forever.

Approach

Add a cron job to the FastAPI server to occasionally issue a docker system prune --all command to clear old layers out. This will make the next build slow since we lose the expensive repo2docker base image.

Make the timing of the purge job configurable through the FastAPI config file.

Remove exit() from the code

There are a few error conditions that end with a sys.exit(-1).

These will cause the entire server to stop. Remove

Container Service Accepts Zip File

As a container Service user I want a provided zip file bundled into the image so I can use assets that are otherwise not installable

Assumptions

  1. The container spec will include a readable URL to the zip file. This will usually be a signed S3 URL
  2. Creating the signed URL is out of scope of funcX
  3. We will extend DLHub API to create a writable URL for the client to save the zip file and then a readable URL for container service to load the zip file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.