basetenlabs / truss Goto Github PK

The simplest way to serve AI/ML models in production

License: MIT License

Dockerfile 0.63% Shell 3.62% Python 94.70% Jinja 0.87% Jupyter Notebook 0.14% Makefile 0.05%

machine-learning artificial-intelligence easy-to-use inference-api inference-server model-serving open-source packaging falcon stable-diffusion

truss's People

Contributors

Stargazers

Watchers

Forkers

collynce jhoetter jcoffi amiruci nanderoo nitin-mane bnisevic dhee2211 techthiyanes haddjyb2k jaedukseo dorucioclea mbrukman quinndiggity vlasvlasvlas luis-sousa-pinto tugrulguner gordiancyber dsarfati hbcbh1999 co-simulation pengguin-net stability-ai 5l1v3r1 slashml aclowes decentralised-ai unifyai nmvijay theshop apollohuang1 plotguy zhutony deniz-birlikci savula15 matthewaltenburg htrivedi99 adataschultz-forks brunoscaglione parisa-ahmadi pizchy-wachida steveefemsc jeremyw-dobeu redstarxz asherbond mr2cool anesmemisevic 2361020362 ievanh llaith-ai mrtuandao acompa vshulman max-muoto subhashdasyam

truss's Issues

Support virtual env creation for a Truss

When developing a Truss, the model code has access to a bunch of python requirements, bundled packages and external packages. It will be great to make it easy to create a virtual environment for a Truss, where the python requirements are installed and the other paths are added to the system path.
Virtual environments have many advantages for truss development

They can provide an isolated environment for testing the truss. Running on docker achieves this as well, but for the environments such as Google colab, where users may find it challenging to install or use docker, this can serve as an alternative or intermediate.
IDEs can be pointed to the virtual environment and thus benefit from auto-completion, jump to definition and many other features.
Ease of writing tests

The path to virtual environment could be customizable. To begin with the the venv path could be required to be outside truss. Once there is support for ignore files then it could be allowed to be inside the truss and the path added to ignore, so that it doesn't get bundled into the docker image etc.

Run repo through AI Docstring Codegen and write a blog post about it

Uniform interface for truss cli

Some commands such as run-examples require truss directory patch directly but some like predict require it passed as --target_directory. We should use the former everywhere to have a uniform interface.

Revisit 503 errors for loading state

Describe the bug
Loading states in a running truss server result in 503 error codes, which are problematic from a monitoring and alerting perspective (they are often bundled with 5xy codes which are used for alerting on-call).

We should find an appropriate error code (4xy level) here that works well for the liveness/readiness probes from a k8s side, but don't result in 5xy error codes.

Working demo?

Hey, I just got here via the TLDR Newsletter (https://tldr.tech)

This project looks great! However, some screenshots or even a demo would be a nice showcase of what's expected to happen on deployment. Currently it's hard to see what or how it would look like over it.

If there is something like this, it's not obvious - I skimmed over the documentation and the readme and couldn't find it.

Thanks!

Simplify the generated default configuration

Description
Upon first use of truss users will often run truss init to create an empty truss to start developing on. Currently, we naively write all the properties of the TrussConfig object to the generated config.yaml which can be overwhelming for first users. We should look to simplify the generated config in order to make this first experience better.

Solution
The current generated config is as follows:

base_image: null
bundled_packages_dir: packages
data_dir: data
description: null
environment_variables: {}
examples_filename: examples.yaml
external_package_dirs: []
input_type: Any
live_reload: false
model_class_filename: model.py
model_class_name: Model
model_framework: custom
model_metadata: {}
model_module_dir: model
model_name: null
model_type: custom
python_version: py39
requirements: []
resources:
  accelerator: null
  cpu: 500m
  memory: 512Mi
  use_gpu: false
secrets: {}
spec_version: '2.0'
system_packages: []
train:
  resources:
    accelerator: null
    cpu: 500m
    memory: 512Mi
    use_gpu: false
  training_class_filename: train.py
  training_class_name: Train
  training_module_dir: train
  variables: {}

We should look to reduce the config to the settings that our users interact with most:

environment_variables: {}
external_package_dirs: []
model_metadata: {}
model_name: null
python_version: py39
requirements: []
resources:
  accelerator: null
  cpu: 500m
  memory: 512Mi
  use_gpu: false
secrets: {}
system_packages: []

Write tech spec for auto-download of HF model weights on build

Reduce size of control plane for live reload - reduce number of base images

If the size contribution of the control plane can be reduced to less than 100 MB, then it can just be bundled into all generated base images by default, cutting down the number of needed base images by half. Generating all these base images makes that image generation and push very slow. Getting rid of these extra base images will also ease complexity and maintenance.

Transactional python requirements installation for live reload

Currently, if multiple python packages are added via live-reload, there's a possibility that some python requirements get installed while others fail. In such a case, since the truss container cannot be allowed to be in an inconsistent state, a full redeploy/rebuild is triggered. This rebuild itself will fail due to the bad python requirement. Overall, this is not a great experience. It will be great to install the packages in a transactional manner, such that if any package installation fails, then the original python requirements can be restored.

There are a few ways to achieve this. One way would be keep the python requirements in a virtual environment, create a copy of the virtual environment for installation via live-reload, and only swap it for the used virtual environment if all installations succeed. This is a clean approach but requires creating a copy of the entire virtual environment, which could be slow and run into disk space issues.
Perhaps an easier option would be to record the pip state and try to restore that state in case of failure.

Adding loading state for `build_image`

Is your feature request related to a problem? Please describe.
When dealing with large weight files (this is about ~5Gb) it looks like truss hangs when its calculating hashes and making the build context. It would be great if we have a spinner and some kind of communication about what is happening while this being calculated here.

Describe the solution you'd like
We could add a spinner that says Preparing docker context or something to that effect while doing this so it doesn't look like it hangs.

Flaky test truss/tests/test_truss_util.py::test_max_modified

It failed on the first run here which was surprising as the change didn't touch this flow. It succeeded on retrying. There's likely some flakiness in the test.

AttributeError: 'NoneType' object has no attribute 'split'

Describe the bug
Throwing AttributeError: 'NoneType' object has no attribute 'split' when trying to mk_truss

To Reproduce
Directly taken from "Quickstart: making a truss"

...
rfc.fit(data_x, data_y)

# Create the Truss (serializing & packaging model)
tr = truss.mk_truss(rfc, target_directory="iris_rfc_truss")  # this line fails
...

Expected behavior
Expected to see working sample

Screenshots/Logs

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[149], line 15
     12 rfc.fit(data_x, data_y)
     14 # Create the Truss (serializing & packaging model)
---> 15 tr = truss.mk_truss(rfc, target_directory="iris_rfc_truss")
     17 # Serve a prediction from the model
     18 # tr.server_predict({"inputs": [[0, 0, 0, 0]]})

File ~/.virtualenvs/datasciense/lib/python3.8/site-packages/truss/build.py:267, in mk_truss(model, target_directory, data_files, requirements_file, bundled_packages)
    257 def mk_truss(
    258     model: Any,
    259     target_directory: str = None,
   (...)
    264     # Some model objects can are callable (like Keras models)
    265     # so we first attempt to make Truss via a model object
--> 267     model_scaffold = mk_truss_from_model_with_exception_handler(
    268         model, target_directory, data_files, requirements_file, bundled_packages
    269     )
    270     if model_scaffold:
    271         return model_scaffold

File ~/.virtualenvs/datasciense/lib/python3.8/site-packages/truss/build.py:203, in mk_truss_from_model_with_exception_handler(*args)
    200 def mk_truss_from_model_with_exception_handler(*args):
    201     # returns None if framework not supported, otherwise the Truss
    202     try:
--> 203         return mk_truss_from_model(*args)
    204     except FrameworkNotSupportedError:
    205         return None

File ~/.virtualenvs/datasciense/lib/python3.8/site-packages/truss/build.py:99, in mk_truss_from_model(model, target_directory, data_files, requirements_file, bundled_packages)
     97 else:
     98     target_directory_path = Path(target_directory)
---> 99 model_framework.to_truss(model, target_directory_path)
    100 scaf = TrussHandle(target_directory_path)
    101 _update_truss_props(scaf, data_files, requirements_file, bundled_packages)

File ~/.virtualenvs/datasciense/lib/python3.8/site-packages/truss/model_framework.py:69, in ModelFramework.to_truss(self, model, target_directory)
     61 python_version = map_to_supported_python_version(infer_python_version())
     63 # Create config
     64 config = TrussConfig(
     65     model_name=self.model_name(model),
     66     model_type=self.model_type(model),
     67     model_framework=self.typ(),
     68     model_metadata=self.model_metadata(model),
---> 69     requirements=self.requirements_txt(),
     70     python_version=python_version,
     71 )
     72 with (target_directory / CONFIG_FILE).open("w") as config_file:
     73     yaml.dump(config.to_dict(), config_file)

File ~/.virtualenvs/datasciense/lib/python3.8/site-packages/truss/model_framework.py:29, in ModelFramework.requirements_txt(self)
     27 def requirements_txt(self) -> List[str]:
---> 29     return list(infer_deps(must_include_deps=self.required_python_depedencies()))

File ~/.virtualenvs/datasciense/lib/python3.8/site-packages/truss/environment_inference/requirements_inference.py:44, in infer_deps(must_include_deps)
     41 if not must_include_deps:
     42     must_include_deps = set()
---> 44 pkg_candidates = _extract_packages_from_frame(relevant_stack[0].frame)
     45 imports = must_include_deps.union(pkg_candidates)
     46 requirements = set([])

File ~/.virtualenvs/datasciense/lib/python3.8/site-packages/truss/environment_inference/requirements_inference.py:98, in _extract_packages_from_frame(frame)
     96     pkg_name = val.__name__.split(".")[0]
     97 elif hasattr(val, "__module__"):
---> 98     pkg_name = val.__module__.split(".")[0]
     99 else:
    100     continue

AttributeError: 'NoneType' object has no attribute 'split'

Desktop (please complete the following information):

OS: Linux 5+
Environment versions: python 3.8.12, sklearn 1.1.3 with poetry 1.3.1
Truss version: 0.1.12

Additional context
Nothing

Support Package directory patch for control server

Silence file copy logs in truss, especially during `truss init` and docker context copy

truss/truss/util/path.py

Line 19 in ead203d

def copy_tree_path(src: Path, dest: Path) -> None:

There should be verbose kwargs that you can set I think.

Handle Term signals properly/more gracefully in Truss Servers

While debugging another issue, we found that term signals are not handled properly in Truss server.

For both Inference and Control server:

This part may have been lost during move to directly using FastApi.
Review/improve model pod termination this week and plan to take care of this.

Create Github pages homepage for truss

Ensure .truss_ignore is respected in `truss push`

Truss dependencies failing on Windows

The dockerfile dependency in particular is having issues on windows.

Plan of action:

Move import inside function calls
move depedencies to dev dependencies

Write automated smoke test suite for TrussRemote for all python/gpu matrix

Prevent duplicate requirements with add_python_requirement

Describe the bug
When I run add_python_requirement and the requirement is already listed in config.yaml, it is listed again.

e.g.

requirements:
- joblib==1.0.0
- joblib==1.0.0
- joblib==1.1.0

To Reproduce

Run the add_python_requirement command twice with the same requirement

import truss

truss.init("test_truss")
tr = truss.from_directory("test_truss")
#tr.add_python_requirement("joblib==1.0.0")
#tr.add_python_requirement("joblib==1.0.0")
#tr.add_python_requirement("joblib==1.1.0")

Expected behavior

If the package is already a requirement, don't add it again.

If the package is already a requirement but I am specifying a different version, update the version rather than duplicating the entry.

Expected output of the above code sample:

requirements:
- joblib==1.1.0

Screenshots/Logs
If applicable, add screenshots or logs to help explain your problem.

Desktop (please complete the following information):

OS: MacOS
Environment versions: Python 3.7
Truss version: 0.0.30

additional information

Investigate whether the same thing happens for system requirements and fix it there, too!

Truss handle should reflect changes made to truss after it was created

Typical flow is like this:

handle = mk_truss(model, target_directory='some_dir')
# Make changes to truss, like modify `model.py`
# Recreate handle to load changes
handle = from_directory('some_dir')

That last step should is easy to forget and should not be needed. It will be great if the handle automatically reflects changes tot he underlying truss.

Generate deployment configurations such as k8s deployment specs for Truss

The journey from the ml model to a serving model can be made even easier by generating deployment artifacts such as k8s deployment specs and helm charts.

A k8s deployment manifest could be a good starting point. It will save users the work of figuring out how to deploy a Truss docker image to k8s. Even if they want to customize it, it will provide them a good starting point.

We could start with something basic:

Generate a deployment spec
Generate a k8s service spec to invoke the served model

As a follow-up, it would be good to extend support with a section to support common options such as the number of replicas. As much as possible we should capture succinct specifications in the Truss config to auto-generate the k8s specs. But, of course, users can customize the generated deployment specs the way they want.

A helm chart would be a great way of packaging these k8s specs. This should perhaps be captured in a separate issue and done as a follow-up.

We want to integration-test this. For this, we'll need to update the codespace setup to install minikube, to be able to start a local minikube cluster for these tests.

Support for model binary changes for live reload

Ultimately, we should support live reload to all parts of truss. The most important remaining part is the data section of the truss.

A key challenge is transferring such large payloads for this. The current patch endpoint can't support this, it would be very clumsy to cram big binary blobs into the patch json. A separate endpoint is likely needed on the Control Proxy to upload large blobs. The patch endpoint can then be enhanced to accept the content hash of the blob.

Refactor configuration diff / intersection patterns for reuse when calculating patches

Modify `external_package_dirs` behavior to copy dir name based on flag

Is your feature request related to a problem? Please describe.
Currently, when I use external_package_dirs, it copies what's inside the folder and I have to restructure the whole repo to get imports to work (by moving everything inside one directory that has the correct import paths as they would show up after things are copied to the truss).

However, often times I work with open source repos that use top level imports and moving that many things around is impractical, especially when I want to contribute the truss back to the repo to increase reach.

Describe the solution you'd like
We can add a config called keep_dir_names that copies the directory names so that the imports in other directories that are copied work without modification and it's easier to avoid conflicts without changing everything else in the repo.

Improve top-level README

Potential avenues of improvement

A clearly enumerated list of ~5 differentiating features/capabilities. This list should also include models that can be packaged with Truss (I know you can package anything, but we should get the keywords of popular OSS models up top in the readme)
A clear, compelling first flight experience. Something built around the AI search example could work well here
General setup and installation instructions can be a bit lower
A used by and contributor section to recognize the community and provide social proof
Links out to relevant docs all over the readme rather than just one "here be the docs" link

Add `setup_script` functionality to truss

Is your feature request related to a problem? Please describe.
It would great if people can easily copy instructions like this from repos into their truss config instead of worrying about how to make everything work with our requirements

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Describe the solution you'd like
a setup_script multiline config in truss where I can type up a script that can be run after the requirements and system_packages have been installed.

Respect different cuda versions

Is your feature request related to a problem? Please describe.
While I can configure almost anything in my truss, the cuda versions are fixed which make it hard to copy the configuration that some open source models have that depend on specific versions of cuda.

Describe the solution you'd like
NVIDIA pretty much has tags for all the supported cuda versions. It would be great if we can support these in the config.yaml so that I can quickly bring repos from github to truss and deploy them.

Switch to 3.9 style python version instead of py39

It's better to stick to the semantic version schema as it's more intuitive to end users. The way to transition would be to have both py39 and 3.9 style versions supported for a while. Ultimately support for py39 style can be dropped.

Support for verbose prediction errors

Truss eases turning raw models into services. Model Services normally shouldn't leak exception information into prediction error responses, but this can be useful during development. Baseten draft models are an example. It would be great to provide optional support for verbose prediction responses, where entire stack traces could be returned alongside the prediction error response.

Support for unit tests in truss

A tests directory under truss which has access to model code, bundled packages and external packages. To begin with, pytest can be used as the standard way of writing tests. A follow-up would be to support dev-only python requirements, to allow flexibility of adding any other helper libraries for testing.

The tests directory should not be affect live-reload and should not go into the docker images generated for the truss.

Add warning when symlink is detected during context buid

Add warning when detecting symlink during building the build context instead of failing on copy

Reproduce:

create a truss with a virtualenv inside
run truss build-image ./path/to/truss/w/venv
Note the failure

Desired behavior:

A meaningful error message about a symlink was detected and where. What the user should do about it.

Local docker runs should honor resource specifications in config

We don't enforce any limits right now, but we should to mimic production deployments and let one tune these settings well locally.

Create base class for PatchApplier

Improve autogenerated release notes using towncrier or the like

Setup mkdocs for docstring documentation

Too much error repetition for a live reload truss during startup

Description
If one calls a live-reloadable truss when it's starting up for the first time, one sees a lot of 503 errors. This is the control proxy retrying while waiting for the inference server to start up. The original call may ultimately succeed, but the lots of 503 logs are confusing. One may think that a lot of calls are failing.

To Reproduce
Create a truss with say a sleep of 10 seconds in model.load. Call this Truss right after building finishes and service deployment begins. Observer logs.

Expected behavior
Only ultimate success or failure logs for the incoming request. Internal logs shouldn't be there, or logs should clearly indicate what they're for.

Composes trusses to sharing GPUs

Is your feature request related to a problem? Please describe.
After some exploration, it seems like we can definitely composes various trusses into one image so that can share a GPU. This has many advantages for server costs, but also for local development when I want to work with multiple trusses, but my machine only happens to have one GPU.

This is issue is to open up a discussion on what could be the goals for bringing this into truss, both in the near term and long term.

Split up truss into multiple python packages for monorepo

Write an integration test for context builder image

Context builder image is lean to reduce size, all it's meant for is to generate docker build context from a Truss. It doesn't include numpy, python_on_whales etc, so those imports in truss_handle are not at the top level. This is very easy to break though without realizing. We should add an integration test to catch such issues proactively.

Ignore hidden and pycache files when building docker images and for live reload

Features such as live reload monitor for any changes to Truss, and if the changes are not supported then fallback to full build. Even for regular builds the whole truss directory is bundled, which means hidden directories such as .git also get bundled. It would be great to ignore hidden files and directories, and also pycache files for these. More could be added to this list, e.g. when support for tests is added, the designated tests directory it could belong here.

In the long term, support for a .trussignore file could be useful, but it's not clear if there's an immediate need for it.

Add support for custom base image in Truss

Is your feature request related to a problem? Please describe.
I am trying to deploy a NeMO Megatron based model using truss. The installation has a lot of dependencies that I cannot achieve using simply the provided system_packages and python_requirements features that truss provides. I have spent two days trying to resolve these issues but keep going down the rabbit hole of installing more things that happen to not work.

However, NVIDIA provides a base image that does work for all these kinds of models and includes all kinds of run-time optimizations that I would like to benefit from.

Describe the solution you'd like
I would like a custom_base_image config in the config.yaml for truss. If this custom image is provided, truss tries to use it for preparing the truss server, instead of the base gpu image.

It's okay for this custom_base_image to not succeed in all cases. For example, if I use a base image that doesn't have a compatible python version, truss server might fail. I think that's okay given that this is a very advanced feature.

Base image does not respect python version from truss

Describe the bug
A clear and concise description of what the bug is.
adding python_version: py38 or python_version: 3.8 to the truss config does not result in python version 3.8 being available in the resulting docker image.

To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.
Should install the correct logger version

I believe the issue is that the base image doesn't build based on the correct python version: https://github.com/basetenlabs/truss/blob/main/docker/base_images/base_image.Dockerfile.jinja#L41

Integrate with SageMaker

Really like the look of this project. I saw the AWS integration guide, but I was wondering what it'd be like to integrate with SageMaker (https://aws.amazon.com/sagemaker/).

I suspect there might be quite a bit of additional work in creating training and serving containers in the format that SageMaker can consume: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html

In addition to that, I believe the officially supported containers are framework specific and have specific drivers that connect the frameworks to the available GPUs, so I don't know how easy it'd be to even maintain a single set of containers that worked for all frameworks.

Create Github releases for minor release versions

Support Data directory patch for control server

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Ignore pycache directories when calculating patches for live_reload of truss

live_reload is great for quickly making changes to model serving code on a running truss on docker. But making changes to the model.py often creates a pycache directory. live_reload mechanism doesn't understand this change and skip resulting in full build. This only happens first time, as the pycache directory does not change afterwards, but it's still annoying. It would be great if we could simply ignore any pycache directories.

Control server should oversee inference server

Control server should try to restart inference server if it crashes for some reason. Right now, if inference server crashes then there's no way for the situation to be corrected.

Better Error Message on Missing Secrets

Right now, accessing a secret that does not exist results in a KeyNotFound error, which is unclear to users. Let's return a better error, and potentially consider logging the available secret names.

Only include training configs when truss `create` has a special flag

Is your feature request related to a problem? Please describe.
When creating a truss, which I usually use for serving, I get a lot of traning configs that I always delete since they are not relevant to my needs. I would be great if I only get these configs when I want to.

Describe the solution you'd like
Respect this same trainable flag that we use in the create, which decides whether or not to include the training template files. If the training files are not included, the config should also not be included.