Coder Social home page Coder Social logo

voltaml / voltaml Goto Github PK

View Code? Open in Web Editor NEW
1.2K 12.0 40.0 46.35 MB

⚡VoltaML is a lightweight library to convert and run your ML/DL deep learning models in high performance inference runtimes like TensorRT, TorchScript, ONNX and TVM.

License: Apache License 2.0

Python 97.58% Shell 0.22% Dockerfile 0.09% Jupyter Notebook 2.10%

voltaml's Introduction

Screenshot 2022-10-19 at 3 55 14 PM

Accelerate your machine learning and deep learning models by upto 10X

🔥UPDATE: Stable-Diffusion/DreamBooth Acceleration. Upto 2.5X speed up in inference🔥

voltaML is an open-source lightweight library to accelerate your machine learning and deep learning models. VoltaML can optimize, compile and deploy your models to your target CPU and GPU devices, with just one line of code.

animated

Out of the box support for

✅ FP16 Quantization

✅ Int8 Quantization*

✅ Hardware specific compilation


Screenshot 2022-10-17 at 12 06 26 PM


voltaML has compilation support for the following:

Screenshot 2022-06-13 at 3 43 03 PM

Installation

Own setup:

Requirements:

  • CUDA Version >11.x
  • TensorRT == 8.4.1.2
  • PyTorch == 1.12 cu11.x
  • NVIDIA Driver version > 510
git clone https://github.com/VoltaML/voltaML.git
cd voltaML
python setup.py install

Docker Container 🐳

docker pull voltaml/voltaml:v0.4
docker run -it --gpus=all -p "8888:8888" voltaml/voltaml:v0.4 \ 
        jupyter lab --port=8888 --no-browser --ip 0.0.0.0 --allow-root

Usage

import torch
from voltaml.compile import VoltaGPUCompiler, VoltaCPUCompiler, TVMCompiler
from voltaml.inference import gpu_performance

model = torch.load("path/to/model/dir")

# compile the model by giving paths
compiler = VoltaGPUCompiler(
        model=model,
        output_dir="destination/path/of/compiled/model",
        input_shape=(1, 3, 224, 224), # example input shape
        precision="fp16" # specify precision[fp32, fp16, int8] - Only for GPU compiler
        target="llvm" # specify target device - Only for TVM compiler
    )

# returns the compiled model
compiled_model = compiler.compile()

# compute and compare performance
gpu_performance(compiled_model, model, input_shape=(1, 3, 224, 224))
cpu_performance(compiled_model, model, compiler="voltaml", input_shape=(1, 3, 224, 224))
cpu_performance(compiled_model, model, compiler="tvm", input_shape=(1, 3, 224, 224))

Notebooks

  1. ResNet-50 image classification
  2. DeeplabV3_MobileNet_v3_Large Segmentation
  3. YOLOv5 Object Detection YOLOv5
  4. YOLOv6 Object Detection YOLOv6
  5. Bert_Base_Uncased Huggingface

Benchmarks

🖼️ Classification Models Inference Latency (on GPU) ⏱️

Classification has been done on Imagenet data, batch size = 1 and imagesize = 224 on NVIDIA RTX 2080Ti. In terms of top 1% and 5% accuracy for int8 models, we have not seen an accuracy drop of more than 1%.

Pytorch (ms), VoltaGPU FP16 (ms) and VoltaGPU int8 (ms)

Model Pytorch (ms) VoltaGPU FP16 (ms) VoltaGPU int8 (ms) Pytorch vs Int8 Speed
squeezenet1_1 1.6 0.2 0.2 8.4x
resnet18 2.7 0.4 0.3 9.0x
resnet34 4.5 0.7 0.5 9.0x
resnet50 6.6 0.7 0.5 13.2x
resnet101 13.6 1.3 1.0 13.6x
densenet121 15.7 2.4 2.0 7.9x
densenet169 22.0 4.4 3.8 5.8x
densenet201 26.8 6.3 5.0 5.4x
vgg11 2.0 0.9 0.5 4.0x
vgg16 3.5 1.2 0.7 5.0x

🧐 Object Detection (YOLO) Models Inference Latency (on GPU) ⏱️

Object Detection inference was done on a dummy data with imagesize = 640 and batch size = 1 on NVIDIA RTX 2080Ti.

Pytorch (ms) and VoltaGPU FP16 (ms)

Model Pytorch (ms) VoltaGPU FP16 (ms) Pytorch vs FP16 Speed
YOLOv5n 5.2 1.2 4.3x
YOLOv5s 5.1 1.6 3.2x
YOLOv5m 9.1 3.2 2.8x
YOLOv5l 15.3 5.1 3.0x
YOLOv5x 30.8 6.4 4.8x
YOLOv6s 8.8 3.0 2.9x
YOLOv6l_relu 23.4 5.5 4.3x
YOLOv6l 18.1 4.1 4.4x
YOLOv6n 9.1 1.6 5.7x
YOLOv6t 8.6 2.4 3.6x
YOLOv5m 15.5 3.5 4.4x

🎨 Segmentation Models Inference Latency (on GPU) ⏱️

Segmentation inference was done on a dummy data with imagesize = 224 and batch size = 1 on NVIDIA RTX 2080Ti.

Pytorch (ms), VoltaGPU FP16 (ms) and VoltaGPU Int8 (ms)(1)

Model Pytorch (ms) VoltaGPU FP16 (ms) VoltaGPU Int8 (ms) Speed Up (X)
FCN_Resnet50 8.3 2.3 1.8 3.6x
FCN_Resnet101 14.7 3.5 2.5 5.9x
DeeplabV3_Resnet50 12.1 2.5 1.3 9.3x
DeeplabV3_Resnet101 18.7 3.6 2.0 9.4x
DeeplabV3_MobileNetV3_Large 6.1 1.5 0.8 7.6x
DeeplabV3Plus_ResNet50 6.1 1.1 0.8 7.6x
DeeplabV3Plus_ResNet34 4.7 0.9 0.8 5.9x
UNet_ResNet50 6.2 1.3 1 6.2x
UNet_ResNet34 4.3 1.1 0.8 5.4x
FPN_ResNet50 5.5 1.2 1 5.5x
FPN_ResNet34 4.2 1.1 1 4.2x

🤗 Accelerating Huggingface Models using voltaML

We're adding support to accelerate Huggingface NLP models with voltaML. This work has been inspired from ELS-RD's work. This is still in the early stages and only few models listed in the below table are supported. We're working to add more models soon.

from voltaml.compile import VoltaNLPCompile
from voltaml.inference import nlp_performance


model='bert-base-cased'
backend=["tensorrt","onnx"] 
seq_len=[1, 1, 1] 
task="classification"
batch_size=[1,1,1]

VoltaNLPCompile(model=model, device='cuda', backend=backend, seq_len=seq_len)

nlp_performance(model=model, device='cuda', backend=backend, seq_len=seq_len)

Pytorch (ms) and VoltaML FP16 (ms)

Model Pytorch (ms) VoltaML FP16 (ms) SpeedUp
bert-base-uncased 6.4 1 6.4x
Jean-Baptiste/camembert-ner 6.3 1 6.3x
gpt2 6.6 1.2 5.5x
xlm-roberta-base 6.4 1.08 5.9x
roberta-base 6.6 1.09 6.1x
bert-base-cased 6.2 0.9 6.9x
distilbert-base-uncased 3.5 0.6 5.8x
roberta-large 11.9 2.4 5.0x
deepset/xlm-roberta-base-squad2 6.2 1.08 5.7x
cardiffnlp/twitter-roberta-base-sentiment 6 1.07 5.6x
sentence-transformers/all-MiniLM-L6-v2 3.2 0.42 7.6x
bert-base-chinese 6.3 0.97 6.5x
distilbert-base-uncased-finetuned-sst-2-english 3.4 0.6 5.7x
albert-base-v2 6.7 1 6.7x

voltaTrees ⚡🌴 -> Link

A LLVM-based compiler for XGBoost and LightGBM decision trees.

voltatrees converts trained XGBoost and LightGBM models to optimized machine code, speeding-up prediction by ≥10x.

Example

import voltatrees as vt

model = vt.XGBoostRegressor.Model(model_file="NYC_taxi/model.txt")
model.compile()
model.predict(df)

Installation

git clone git clone https://github.com/VoltaML/volta-trees.git
cd volta-trees/
pip install -e .

Benchmarks

On smaller datasets, voltaTrees is 2-3X faster than Treelite by DMLC. Testing on large scale dataset is yet to be conducted.

Enterpise Platform 🛣️

Any enterprise customers who would like a fully managed solution hosted on your own cloud, please contact us at [email protected]

  • Fully managed and cloud-hosted optimization engine.
  • Hardware targeted optimised dockers for maximum performance.
  • One-click deployment of the compiled models.
  • Cost-benefit analysis dashboard for optimal deployment.
  • NVIDIA Triton optimzed dockers for large-scale GPU deployment.
  • Quantization-Aware-Training (QAT)

voltaml's People

Contributors

bishakh17 avatar harishprabhala avatar mshr-h avatar pranavmkoundinya avatar riteshgangnani10 avatar the-neural-networker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

voltaml's Issues

TensorRT requirement download doesn't exist

Edit: learned that the readme is super old and volta doesn't even use TRT anymore.

Readme says "TensorRT == 8.4.1.2"
How strict is this? Is it possible to use Ubuntu 22.04 with a newer version of TensorRT like 8.5.1?

If strict, then the readme is wrong and probably is referring to 8.4.1.5? Because TensorRT downloads only have:

  • 8.4 EA (8.4.0.6)
  • 8.4 GA (8.4.1.5)
  • 8.4 GA update 1 (8.4.2.4)
  • 8.4 GA update 2 (8.4.3.1)

https://developer.nvidia.com/nvidia-tensorrt-8x-download

win?

a bit of a tricky question, but do you also work on windows?

Update requirements.txt and use optional dependencies

It would be good to have an updated requirements.txt since there are several dependencies that are not listed (e.g. onnxoptimizer, nvidia-tensorrt, onnxruntime, transformer-deploy) and also requires TVM to be installed.

If any of these are not core to VoltaML it might be worth adopting an installation method that allows the user to select what is needed, e.g:

pip install voltaml[minimal]
pip install volatml[tvm]
etc...

If TVM is a core requirement, it would probably be worth documenting that since building TVM from source may not be trivial in some environments (e.g. Kaggle GPU notebooks, Colab etc)

This repo is impossible to install locally for developing purposes

I have tried almost everything and I have follow the few instructions you provide.

I think the list of requirements should be something like this:

numpy
scipy
pillow
torch==1.12.0 # This is a requirement from you
torchvision==0.13.0
decorator 
attrs
tornado
psutil
#xgboost # Just in case you want to optimize a xgboost model
cloudpickle
onnx
onnxoptimizer
tqdm
#time # There's no a package called like this


# Other packages
nvidia-pyindex  # TensorRT index
nvidia-tensorrt<8.5 # I couldn't install the tensorrt version your repo require
PyYAML
pandas
opencv-python
matplotlib
seaborn
transformers
onnxruntime
apache-tvm

I tried everything and I am still getting the following error:
ModuleNotFoundError: No module named 'transformer_deploy'

I think this should be installed using pip install sentence-transformer, but I am still getting some packages are not installed.

discord bot not sending image

first and foremost the discord package wasnt installed in the first place and manually doing so could be the cause of this.

but when running "/dream prompt: banana model: AnythingV3" the bot responds with the message normally but for some reason ( i believe its the base64 encoding/decoding ) it fails to send the final image.

error: HybridCommandError: Hybrid command raised an error: Command 'dream' raised an exception: Error: Incorrect padding

Model running on CPU and GPU

Hi everyone,

I am trying to optimize distilbert-multilingual-uncased from hugging-face. As suggested in examples I have tried to use VoltaNLPCompile, adding quantisation parameter, converting only to onnx. During testing model on CPU, everything works like marvel, there is speed up around 3.4times with data processing included. Unfortunately my use-case is quite specific. Distilbert serves as "embedding generator", used during training, which is running on GPU (later on production it is running on CPU for faster inference).
On GPU the model runs aprox. 30x slower than distilbert transformed by onnx. This is a blocker, because embedding vectors generated by this model must be the same, otherwise the performance decrease is quite substantial.

My first guess is that this is caused by some optimalization for CPU, but I don't know enough about this matter..
My question is: it a feature, bug or something fixable during model compilation?

Thank you in advance,
Best Regards,
Tom

Is this project dead? Suggestion about using existing models

Last update to code was in November. Is this project dead?
Many new & cool models came out since then.
Clarity, Deliberate and others.

Also downloading models from the internet is not most efficient, I only get download speeds of 15 MB/s, on a gigabit internet connection. And considering I already have all the models I need, it's really only wasting time. Should be able to mount a docker volume with models that I already have.

usable in existing Stable Diffusion (2.1) code?

Hello!

Is it possible to use voltaML to accelerate SD without GUI? Or, alternately to the other voltaML repository, can I use voltaML-fast-stable-diffusion also via command line as it is also possible at AUTOMATIC1111?

I am very impressed by your performance benchmarks! It simply doubles the inference time of a fully driver optimized 4090.

Best regards
Marc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.