OLive - ONNX Runtime Go Live

OLive, meaning ONNX Runtime(ORT) Go Live, is a python package that automates the process of accelerating models with ONNX Runtime(ORT). It contains two parts including model conversion to ONNX with correctness checking and auto performance tuning with ORT. Users can run these two together through a single pipeline or run them independently as needed.

Model conversion to ONNX

Simplify multiple frameworks to ONNX conversion experience by integrating existing ONNX conversion tools into a single package, as well as validating the converted models' correctness. Currently supported frameworks are PyTorch and TensorFlow.

TensorFlow: OLive supports conversion with TensorFlow model in saved model, frozen graph, and checkpoint format. User needs to provider inputs' names and outputs' names for frozen graph and checkpoint conversion.
PyTorch: User needs to provide inputs' names and shapes to convert PyTorch model. Besides, user needs to provide outputs' names and shapes to convert torchscript PyTorch model.

Auto performance tuning with ORT

ONNX Runtime(ORT) is a high performance inference engine to run ONNX model. It enables many advanced tuning knobs for user to further optimize inference performance. OLive heuristically explores optimization search space in ORT to select the best ORT settings for a specific model on a specific hardware. It outputs the option combinations with the best performance for latency or for throughput.

Optimization fileds:

Execution Providers:
- MLAS(default CPU EP), Intel DNNL and OpenVino for CPU
- Nvidia CUDA and TensorRT for GPU
Environment Variables:
- OMP_WAIT_POLICY
- OMP_NUM_THREADS
- KMP_AFFINITY
- OMP_MAX_ACTIVE_LEVELS
Session Options:
- inter_op_num_threads
- intra_op_num_threads
- execution_mode
- graph_optimization_level
INT8 Quantization
Transformer Model Optimization

Getting Started

OLive package can be installed with command pip install onnxruntime_olive==0.5.0 -f https://olivewheels.azureedge.net/oaas/onnxruntime-olive . Supported python version: 3.7, 3.8, 3.9

User needs to install CUDA and cuDNN dependencies for perf tuning with OLive on GPU. The table below shows the ORT version and required CUDA and cuDNN version in the latest OLive.

ONNX Runtime	CUDA	cuDNN
1.11.0	11.4	8.2

There are three ways to use OLive:

Use With Command Line: Run the OLive with command line using Python.
Use With Jupyter Notebook: Quickstart of the OLive with tutorial using Jupyter Notebook.
Use With OLive Server: Setup local OLive server for model conversion, optimizaton, and visualization service.

Inference your model with OLive result from auto performance tuning

Get best tuning result with best_test_name, which includes inference session settings, environment variable settings, and latency result.
Set related environment variables in your environment.
- OMP_WAIT_POLICY
- OMP_NUM_THREADS
- KMP_AFFINITY
- OMP_MAX_ACTIVE_LEVELS
- ORT_TENSORRT_FP16_ENABLE

Create onnxruntime inference session with related settings.

inter_op_num_threads
intra_op_num_threads
execution_mode
graph_optimization_level
execution_provider

import onnxruntime as ort
sess_options = ort.SessionOptions()
sess_options.inter_op_num_threads = inter_op_num_threads
sess_options.intra_op_num_threads = intra_op_num_threads
sess_options.execution_mode = execution_mode
sess_options.graph_optimization_level = ort.GraphOptimizationLevel(graph_optimization_level)
onnx_session = ort.InferenceSession(model_path, sess_options, providers=[execution_provider])

Key Updates

10/28/2021

Update OLive from docker container based usage to python package based usage for more flexibilities.

Enable more optimization options for performance tuning with ORT, including INT8 quantization, mix precision in ORT-TensorRT, and transformer model optimization.

Contributing

We’d love to embrace your contribution to OLive. Please refer to CONTRIBUTING.md.

License

Licensed under the MIT License.

ss-torres / olive Goto Github PK

olive's Introduction

OLive - ONNX Runtime Go Live

Model conversion to ONNX

Auto performance tuning with ORT

Getting Started

Inference your model with OLive result from auto performance tuning

Key Updates

Contributing

License

olive's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent