basetenlabs / truss Goto Github PK
View Code? Open in Web Editor NEWThe simplest way to serve AI/ML models in production
Home Page: https://truss.baseten.co
License: MIT License
The simplest way to serve AI/ML models in production
Home Page: https://truss.baseten.co
License: MIT License
When developing a Truss, the model code has access to a bunch of python requirements, bundled packages and external packages. It will be great to make it easy to create a virtual environment for a Truss, where the python requirements are installed and the other paths are added to the system path.
Virtual environments have many advantages for truss development
The path to virtual environment could be customizable. To begin with the the venv path could be required to be outside truss. Once there is support for ignore files then it could be allowed to be inside the truss and the path added to ignore, so that it doesn't get bundled into the docker image etc.
Some commands such as run-examples require truss directory patch directly but some like predict require it passed as --target_directory. We should use the former everywhere to have a uniform interface.
Describe the bug
Loading states in a running truss server result in 503 error codes, which are problematic from a monitoring and alerting perspective (they are often bundled with 5xy codes which are used for alerting on-call).
We should find an appropriate error code (4xy level) here that works well for the liveness/readiness probes from a k8s side, but don't result in 5xy error codes.
Hey, I just got here via the TLDR Newsletter (https://tldr.tech)
This project looks great! However, some screenshots or even a demo would be a nice showcase of what's expected to happen on deployment. Currently it's hard to see what or how it would look like over it.
If there is something like this, it's not obvious - I skimmed over the documentation and the readme and couldn't find it.
Thanks!
Description
Upon first use of truss
users will often run truss init
to create an empty truss to start developing on. Currently, we naively write all the properties of the TrussConfig
object to the generated config.yaml
which can be overwhelming for first users. We should look to simplify the generated config in order to make this first experience better.
Solution
The current generated config is as follows:
base_image: null
bundled_packages_dir: packages
data_dir: data
description: null
environment_variables: {}
examples_filename: examples.yaml
external_package_dirs: []
input_type: Any
live_reload: false
model_class_filename: model.py
model_class_name: Model
model_framework: custom
model_metadata: {}
model_module_dir: model
model_name: null
model_type: custom
python_version: py39
requirements: []
resources:
accelerator: null
cpu: 500m
memory: 512Mi
use_gpu: false
secrets: {}
spec_version: '2.0'
system_packages: []
train:
resources:
accelerator: null
cpu: 500m
memory: 512Mi
use_gpu: false
training_class_filename: train.py
training_class_name: Train
training_module_dir: train
variables: {}
We should look to reduce the config to the settings that our users interact with most:
environment_variables: {}
external_package_dirs: []
model_metadata: {}
model_name: null
python_version: py39
requirements: []
resources:
accelerator: null
cpu: 500m
memory: 512Mi
use_gpu: false
secrets: {}
system_packages: []
If the size contribution of the control plane can be reduced to less than 100 MB, then it can just be bundled into all generated base images by default, cutting down the number of needed base images by half. Generating all these base images makes that image generation and push very slow. Getting rid of these extra base images will also ease complexity and maintenance.
Currently, if multiple python packages are added via live-reload, there's a possibility that some python requirements get installed while others fail. In such a case, since the truss container cannot be allowed to be in an inconsistent state, a full redeploy/rebuild is triggered. This rebuild itself will fail due to the bad python requirement. Overall, this is not a great experience. It will be great to install the packages in a transactional manner, such that if any package installation fails, then the original python requirements can be restored.
There are a few ways to achieve this. One way would be keep the python requirements in a virtual environment, create a copy of the virtual environment for installation via live-reload, and only swap it for the used virtual environment if all installations succeed. This is a clean approach but requires creating a copy of the entire virtual environment, which could be slow and run into disk space issues.
Perhaps an easier option would be to record the pip state and try to restore that state in case of failure.
Is your feature request related to a problem? Please describe.
When dealing with large weight files (this is about ~5Gb) it looks like truss hangs when its calculating hashes and making the build context. It would be great if we have a spinner and some kind of communication about what is happening while this being calculated here.
Describe the solution you'd like
We could add a spinner that says Preparing docker context
or something to that effect while doing this so it doesn't look like it hangs.
It failed on the first run here which was surprising as the change didn't touch this flow. It succeeded on retrying. There's likely some flakiness in the test.
Describe the bug
Throwing AttributeError: 'NoneType' object has no attribute 'split'
when trying to mk_truss
To Reproduce
Directly taken from "Quickstart: making a truss"
...
rfc.fit(data_x, data_y)
# Create the Truss (serializing & packaging model)
tr = truss.mk_truss(rfc, target_directory="iris_rfc_truss") # this line fails
...
Expected behavior
Expected to see working sample
Screenshots/Logs
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[149], line 15
12 rfc.fit(data_x, data_y)
14 # Create the Truss (serializing & packaging model)
---> 15 tr = truss.mk_truss(rfc, target_directory="iris_rfc_truss")
17 # Serve a prediction from the model
18 # tr.server_predict({"inputs": [[0, 0, 0, 0]]})
File ~/.virtualenvs/datasciense/lib/python3.8/site-packages/truss/build.py:267, in mk_truss(model, target_directory, data_files, requirements_file, bundled_packages)
257 def mk_truss(
258 model: Any,
259 target_directory: str = None,
(...)
264 # Some model objects can are callable (like Keras models)
265 # so we first attempt to make Truss via a model object
--> 267 model_scaffold = mk_truss_from_model_with_exception_handler(
268 model, target_directory, data_files, requirements_file, bundled_packages
269 )
270 if model_scaffold:
271 return model_scaffold
File ~/.virtualenvs/datasciense/lib/python3.8/site-packages/truss/build.py:203, in mk_truss_from_model_with_exception_handler(*args)
200 def mk_truss_from_model_with_exception_handler(*args):
201 # returns None if framework not supported, otherwise the Truss
202 try:
--> 203 return mk_truss_from_model(*args)
204 except FrameworkNotSupportedError:
205 return None
File ~/.virtualenvs/datasciense/lib/python3.8/site-packages/truss/build.py:99, in mk_truss_from_model(model, target_directory, data_files, requirements_file, bundled_packages)
97 else:
98 target_directory_path = Path(target_directory)
---> 99 model_framework.to_truss(model, target_directory_path)
100 scaf = TrussHandle(target_directory_path)
101 _update_truss_props(scaf, data_files, requirements_file, bundled_packages)
File ~/.virtualenvs/datasciense/lib/python3.8/site-packages/truss/model_framework.py:69, in ModelFramework.to_truss(self, model, target_directory)
61 python_version = map_to_supported_python_version(infer_python_version())
63 # Create config
64 config = TrussConfig(
65 model_name=self.model_name(model),
66 model_type=self.model_type(model),
67 model_framework=self.typ(),
68 model_metadata=self.model_metadata(model),
---> 69 requirements=self.requirements_txt(),
70 python_version=python_version,
71 )
72 with (target_directory / CONFIG_FILE).open("w") as config_file:
73 yaml.dump(config.to_dict(), config_file)
File ~/.virtualenvs/datasciense/lib/python3.8/site-packages/truss/model_framework.py:29, in ModelFramework.requirements_txt(self)
27 def requirements_txt(self) -> List[str]:
---> 29 return list(infer_deps(must_include_deps=self.required_python_depedencies()))
File ~/.virtualenvs/datasciense/lib/python3.8/site-packages/truss/environment_inference/requirements_inference.py:44, in infer_deps(must_include_deps)
41 if not must_include_deps:
42 must_include_deps = set()
---> 44 pkg_candidates = _extract_packages_from_frame(relevant_stack[0].frame)
45 imports = must_include_deps.union(pkg_candidates)
46 requirements = set([])
File ~/.virtualenvs/datasciense/lib/python3.8/site-packages/truss/environment_inference/requirements_inference.py:98, in _extract_packages_from_frame(frame)
96 pkg_name = val.__name__.split(".")[0]
97 elif hasattr(val, "__module__"):
---> 98 pkg_name = val.__module__.split(".")[0]
99 else:
100 continue
AttributeError: 'NoneType' object has no attribute 'split'
Desktop (please complete the following information):
Additional context
Nothing
Line 19 in ead203d
There should be verbose
kwargs that you can set I think.
While debugging another issue, we found that term signals are not handled properly in Truss server.
For both Inference and Control server:
The dockerfile
dependency in particular is having issues on windows.
Plan of action:
Describe the bug
When I run add_python_requirement and the requirement is already listed in config.yaml, it is listed again.
e.g.
requirements:
- joblib==1.0.0
- joblib==1.0.0
- joblib==1.1.0
To Reproduce
Run the add_python_requirement command twice with the same requirement
import truss
truss.init("test_truss")
tr = truss.from_directory("test_truss")
#tr.add_python_requirement("joblib==1.0.0")
#tr.add_python_requirement("joblib==1.0.0")
#tr.add_python_requirement("joblib==1.1.0")
Expected behavior
If the package is already a requirement, don't add it again.
If the package is already a requirement but I am specifying a different version, update the version rather than duplicating the entry.
Expected output of the above code sample:
requirements:
- joblib==1.1.0
Screenshots/Logs
If applicable, add screenshots or logs to help explain your problem.
Desktop (please complete the following information):
additional information
Investigate whether the same thing happens for system requirements and fix it there, too!
Typical flow is like this:
handle = mk_truss(model, target_directory='some_dir')
# Make changes to truss, like modify `model.py`
# Recreate handle to load changes
handle = from_directory('some_dir')
That last step should is easy to forget and should not be needed. It will be great if the handle automatically reflects changes tot he underlying truss.
The journey from the ml model to a serving model can be made even easier by generating deployment artifacts such as k8s deployment specs and helm charts.
A k8s deployment manifest could be a good starting point. It will save users the work of figuring out how to deploy a Truss docker image to k8s. Even if they want to customize it, it will provide them a good starting point.
We could start with something basic:
As a follow-up, it would be good to extend support with a section to support common options such as the number of replicas. As much as possible we should capture succinct specifications in the Truss config to auto-generate the k8s specs. But, of course, users can customize the generated deployment specs the way they want.
A helm chart would be a great way of packaging these k8s specs. This should perhaps be captured in a separate issue and done as a follow-up.
We want to integration-test this. For this, we'll need to update the codespace setup to install minikube, to be able to start a local minikube cluster for these tests.
Ultimately, we should support live reload to all parts of truss. The most important remaining part is the data section of the truss.
A key challenge is transferring such large payloads for this. The current patch endpoint can't support this, it would be very clumsy to cram big binary blobs into the patch json. A separate endpoint is likely needed on the Control Proxy to upload large blobs. The patch endpoint can then be enhanced to accept the content hash of the blob.
Is your feature request related to a problem? Please describe.
Currently, when I use external_package_dirs
, it copies what's inside the folder and I have to restructure the whole repo to get imports to work (by moving everything inside one directory that has the correct import paths as they would show up after things are copied to the truss).
However, often times I work with open source repos that use top level imports and moving that many things around is impractical, especially when I want to contribute the truss back to the repo to increase reach.
Describe the solution you'd like
We can add a config called keep_dir_names
that copies the directory names so that the imports in other directories that are copied work without modification and it's easier to avoid conflicts without changing everything else in the repo.
Potential avenues of improvement
Is your feature request related to a problem? Please describe.
It would great if people can easily copy instructions like this from repos into their truss config instead of worrying about how to make everything work with our requirements
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Describe the solution you'd like
a setup_script
multiline config in truss where I can type up a script that can be run after the requirements
and system_packages
have been installed.
Is your feature request related to a problem? Please describe.
While I can configure almost anything in my truss, the cuda versions are fixed which make it hard to copy the configuration that some open source models have that depend on specific versions of cuda.
Describe the solution you'd like
NVIDIA pretty much has tags for all the supported cuda versions. It would be great if we can support these in the config.yaml
so that I can quickly bring repos from github to truss and deploy them.
It's better to stick to the semantic version schema as it's more intuitive to end users. The way to transition would be to have both py39 and 3.9 style versions supported for a while. Ultimately support for py39 style can be dropped.
Truss eases turning raw models into services. Model Services normally shouldn't leak exception information into prediction error responses, but this can be useful during development. Baseten draft models are an example. It would be great to provide optional support for verbose prediction responses, where entire stack traces could be returned alongside the prediction error response.
A tests directory under truss which has access to model code, bundled packages and external packages. To begin with, pytest can be used as the standard way of writing tests. A follow-up would be to support dev-only python requirements, to allow flexibility of adding any other helper libraries for testing.
The tests directory should not be affect live-reload and should not go into the docker images generated for the truss.
Add warning when detecting symlink during building the build context instead of failing on copy
Reproduce:
truss build-image ./path/to/truss/w/venv
Desired behavior:
We don't enforce any limits right now, but we should to mimic production deployments and let one tune these settings well locally.
Description
If one calls a live-reloadable truss when it's starting up for the first time, one sees a lot of 503 errors. This is the control proxy retrying while waiting for the inference server to start up. The original call may ultimately succeed, but the lots of 503 logs are confusing. One may think that a lot of calls are failing.
To Reproduce
Create a truss with say a sleep of 10 seconds in model.load. Call this Truss right after building finishes and service deployment begins. Observer logs.
Expected behavior
Only ultimate success or failure logs for the incoming request. Internal logs shouldn't be there, or logs should clearly indicate what they're for.
Is your feature request related to a problem? Please describe.
After some exploration, it seems like we can definitely composes various trusses into one image so that can share a GPU. This has many advantages for server costs, but also for local development when I want to work with multiple trusses, but my machine only happens to have one GPU.
This is issue is to open up a discussion on what could be the goals for bringing this into truss, both in the near term and long term.
Context builder image is lean to reduce size, all it's meant for is to generate docker build context from a Truss. It doesn't include numpy, python_on_whales etc, so those imports in truss_handle are not at the top level. This is very easy to break though without realizing. We should add an integration test to catch such issues proactively.
Features such as live reload monitor for any changes to Truss, and if the changes are not supported then fallback to full build. Even for regular builds the whole truss directory is bundled, which means hidden directories such as .git
also get bundled. It would be great to ignore hidden files and directories, and also pycache files for these. More could be added to this list, e.g. when support for tests is added, the designated tests directory it could belong here.
In the long term, support for a .trussignore file could be useful, but it's not clear if there's an immediate need for it.
Is your feature request related to a problem? Please describe.
I am trying to deploy a NeMO Megatron based model using truss. The installation has a lot of dependencies that I cannot achieve using simply the provided system_packages
and python_requirements
features that truss provides. I have spent two days trying to resolve these issues but keep going down the rabbit hole of installing more things that happen to not work.
However, NVIDIA provides a base image that does work for all these kinds of models and includes all kinds of run-time optimizations that I would like to benefit from.
Describe the solution you'd like
I would like a custom_base_image
config in the config.yaml
for truss. If this custom image is provided, truss tries to use it for preparing the truss server, instead of the base gpu image.
It's okay for this custom_base_image
to not succeed in all cases. For example, if I use a base image that doesn't have a compatible python version, truss server might fail. I think that's okay given that this is a very advanced feature.
Describe the bug
A clear and concise description of what the bug is.
adding python_version: py38
or python_version: 3.8
to the truss config does not result in python version 3.8 being available in the resulting docker image.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Should install the correct logger version
I believe the issue is that the base image doesn't build based on the correct python version: https://github.com/basetenlabs/truss/blob/main/docker/base_images/base_image.Dockerfile.jinja#L41
Really like the look of this project. I saw the AWS integration guide, but I was wondering what it'd be like to integrate with SageMaker (https://aws.amazon.com/sagemaker/).
I suspect there might be quite a bit of additional work in creating training and serving containers in the format that SageMaker can consume: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html
In addition to that, I believe the officially supported containers are framework specific and have specific drivers that connect the frameworks to the available GPUs, so I don't know how easy it'd be to even maintain a single set of containers that worked for all frameworks.
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
live_reload is great for quickly making changes to model serving code on a running truss on docker. But making changes to the model.py often creates a pycache directory. live_reload mechanism doesn't understand this change and skip resulting in full build. This only happens first time, as the pycache directory does not change afterwards, but it's still annoying. It would be great if we could simply ignore any pycache directories.
Control server should try to restart inference server if it crashes for some reason. Right now, if inference server crashes then there's no way for the situation to be corrected.
Right now, accessing a secret that does not exist results in a KeyNotFound error, which is unclear to users. Let's return a better error, and potentially consider logging the available secret names.
Is your feature request related to a problem? Please describe.
When creating a truss
, which I usually use for serving, I get a lot of traning configs that I always delete since they are not relevant to my needs. I would be great if I only get these configs when I want to.
Describe the solution you'd like
Respect this same trainable flag that we use in the create, which decides whether or not to include the training template files. If the training files are not included, the config should also not be included.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.