audeering / audonnx Goto Github PK
View Code? Open in Web Editor NEWDeploy models in ONNX format
Home Page: https://audeering.github.io/audonnx/
License: Other
Deploy models in ONNX format
Home Page: https://audeering.github.io/audonnx/
License: Other
I recently tried running an audonnx
model on the GPU, and installed the latest (v1.10.0
) version for onnxruntime-gpu
. Unfortunately model loading crashed with:
ValueError: This ORT build has ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers par
ameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'], ...)
The issue was resolved after downgrading to v1.9.0
. Judging by the error, I guess it's a simple instantiation problem of InferenceSession
.
At the moment you cannot use audonnx
with numpy 2.0 as onnxruntime
does not support it, see microsoft/onnxruntime#21063.
i cannot find any option of DMLExecutionProvider in ort.py i tried to change CPUExecutionProvider with DMLExecutionProvider but it didnt work as it kills kernal.
Kindly provide a way to run model with DMLExecutionProvider so i can run it on my AMD Rx 570 GPU !
As the following simple example shows we currently only replace 'time' in the shape of input nodes, but should do the same with output nodes:
import torch
import audonnx
class TorchModel(torch.nn.Module):
def __init__(
self,
):
super().__init__()
def forward(self, x: torch.Tensor):
return x
torch_model = TorchModel()
input = torch.randn(1, 100, 4) # batch, time, feature
output = torch_model(input)
onnx_path = 'model.onnx'
torch.onnx.export(
torch_model,
input,
onnx_path,
input_names=['input'],
output_names=['output'],
dynamic_axes={
'input': {1: 'time'},
'output': {1: 'time'},
},
opset_version=12,
)
audonnx.Model(onnx_path)
Input:
output:
shape: [1, -1, 4]
dtype: tensor(float)
transform: None
Output:
output:
shape: [1, time, 4] # should be [1, -1, 4]
dtype: tensor(float)
labels: [output-0, output-1, output-2, output-3]
When trying to install the packages we get:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
399
onnx 1.13.0 requires protobuf<4,>=3.20.2, but you have protobuf 3.20.1 which is incompatible.
The problem is that we specify
protobuf <=3.20.1 # avoid TypeError: Descriptors cannot not be created directly
in docs/requirements.txt
.
We need to check how to make this work for newer versions of protobuf
, for now I will restrict onnx
to <1.13.0
in docs/requriements.txt
in #47.
When loading a model on the CPU and running it, onnxruntime will use as many resources as it can get.
This can be avoided by specifying intra_op_num_threads
and inter_op_num_threads
.
Maybe we can tackle both by adding an num_workers
argument to audonnx.load()
and audonnx.Model
as we have in other packages?
It seems also reasonable to set the default to 1.
In #81 we added the possibility to set number of workers by using the sess_options
argument of onnxruntime.InferenceSession. As it supports more settings, from which we at least want to use enable_cpu_mem_arena
, it might be a good idea to provide an argument to set those options in audonnx.Model
and audonnx.load()
.
We need to decide if a onnxruntime_sess_options
argument should then be overwrite num_workers
or the other way around.
As discussed in #34 (comment) it might be good to have a method to collect a list of labels corresponding to specific output nodes.
This could replace
outputs = ['gender', 'confidence']
labels = sum([model.outputs[o].labels for o in outputs], [])
with
outputs = ['gender', 'confidence']
labels = model.labels(outputs=['gender', 'confidence'])
Currently audonnx.Model only takes a signal as input.
https://github.com/audeering/audonnx/blob/main/audonnx/core/model.py#L198-L209
For some models, however, it might be necessary to provide additional information.
E.g. in the case of an LSTM model, we would like to provide the last hidden state and the memory state as additional input.
This is currently impossible as all additional inputs are derived directly from the signal.
It would be great if we could provide an option to enable additional model inputs along side the signal.
Currently we do not support assigning labels if the last dimension is dynamic:
class TorchModel(torch.nn.Module):
def __init__(
self,
):
super().__init__()
def forward(self, x: torch.Tensor):
return torch.cat([x, x])
torch_model = TorchModel()
input = torch.randn((1, 10)) # batch, time, feature
output = torch_model(input)
print(output)
onnx_path = 'model.onnx'
torch.onnx.export(
torch_model,
input,
onnx_path,
input_names=['input'],
output_names=['output'],
dynamic_axes={
'input': {1: 'time'},
'output': {1: 'time'},
},
opset_version=12,
)
onnx_model = audonnx.Model(
onnx_path,
labels=['left', 'right'],
)
ValueError: Cannot assign 2 labels to output 'output' with dimension -1.
Since we usually expect time last in our packages (e.g. audinterface
), we should support this and ignore dynamic axis when assigning the labels.
It might be useful to have the option to add some description to a node. This could be used to store additional information like value range, frame length, etc.
Pull request is created at #54
This does not work yet, as onnxruntime
has no support for Python 3.11 yet, see microsoft/onnxruntime#13482
See #23 (comment)
As described in https://onnxruntime.ai/docs/api/python/api_summary.html#data-on-device one can use the IO bindings to move data to a CUDA device, which most likely makes execution faster.
We should test if this is the case and if yes, should integrate it in audonnx
.
Currently we have the hidden function audonnx.core.model._device_to_providers()
which is useful when creating providers for an onxruntime.InferenceSession
. I would propose to simply make it part of the official API under audonnx.device_to_providers()
.
When inspecting a loaded model it is possible to have a dimension that has no fixed size, e.g.
cnn_features:
shape: [1, 512, -1]
dtype: tensor(float)
labels: [cnn_features-0, cnn_features-1, cnn_features-2, (...), cnn_features-509,
cnn_features-510, cnn_features-511]
In order to be able to use such a model with audinterface
one needs to know the hop duration (and window duration) of the output, so it would be nice if this would be stored as part of the model metadata.
In the unit tests we currently create torch models that we then convert to ONNX. Instead we can use onnx.helper
to directly create the ONNX graphs. This will make it easier to create and test models on the fly.
Currently we don't test for Windows as we get the following two errors:
relpath
under WindowsAs discussed in #34 (comment) it would be useful to add a squeeze
argument to audonnx.Model.__call__()
as this would allow to simplify adding interfaces to it, e.g.
interface = audinterface.Process(
process_func=process_func,
process_func_args={'squeeze': True},
)
instead of
def process_func(signal, sampling_rate):
return model(signal, sampling_rate)[0][0]
interface = audinterface.Process(process_func=process_func)
The expected behavior of squeeze
could be something like this:
[[0]]
-> 0
[[0, 1]]
-> [0, 1]
The results you get back when running a model can depend on the device, and can even vary across several calls on the same device. It might be a good idea to add a "Reproducibility" section to the documentation in which we discuss these issues.
For example, let us use the model introduced in w2v2-how-to:
import audeer
import audonnx
import numpy as np
url = 'https://zenodo.org/record/6221127/files/w2v2-L-robust-12.6bc4a7fd-1.1.0.zip'
cache_root = audeer.mkdir('cache')
model_root = audeer.mkdir('model')
archive_path = audeer.download_url(url, cache_root, verbose=True)
audeer.extract_archive(archive_path, model_root)
np.random.seed(1)
sampling_rate = 16000
signal = np.random.normal(size=sampling_rate).astype(np.float32)
Now, let us execute the model on the CPU:
>>> model = audonnx.load(model_root, device='cpu')
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)
When using the CPU we always get back the same result,
when executing it multiple times.
Then let's switch to the GPU:
>>> model = audonnx.load(model_root, device='cuda:0')
>>> model(signal, sampling_rate)['logits']
array([[0.68319285, 0.64667934, 0.49738473]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.68317926, 0.6466613 , 0.4974225 ]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.683162 , 0.64668435, 0.4973961 ]], dtype=float32)
We see that we get different results after the fifth decimal place for each run,
and the average result deviates from the CPU based result by:
array([[-2.62856483e-05, -5.79953194e-05, -1.06304884e-04]], dtype=float32)
This is a known ONNX limitation (microsoft/onnxruntime#9704).
In microsoft/onnxruntime#4611 (comment) they propose to select a fixed convolution algorithm to improve this behavior, see also https://pytorch.org/docs/stable/notes/randomness.html#cuda-convolution-benchmarking.
With audonnx
we can achieve this by
>>> providers = [("CUDAExecutionProvider", {'cudnn_conv_algo_search': 'DEFAULT'})]
>>> model = audonnx.load(model_root, device=providers)
>>> model(signal, sampling_rate)['logits']
array([[0.683191 , 0.64670646, 0.4973919 ]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6830938 , 0.6466217 , 0.49734592]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6831656 , 0.64666504, 0.497427 ]], dtype=float32)
It does not really improve results.
It seems that we can only recommend the following when reproducibility is desired:
array([[0.68, 0.65, 0.50]], dtype=float32)
/cc @audeerington
When using onnxruntime==1.13.1
I see the following when loading a model with audonnx
:
2022-12-20 12:35:32.566080649 [W:onnxruntime:, session_state.cc:1030 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not hav
e an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2022-12-20 12:35:32.566108698 [W:onnxruntime:, session_state.cc:1032 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
With the newest version of onnxruntime
(1.16.0) we are getting the following warning when building the docs:
WARNING: Cell printed to stderr:
WARNING:root:Please consider to run pre-processing before quantization. Refer to example: https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/image_classification/cpu/ReadMe.md
They suggest to first pre-process the model, e.g.
$ python -m onnxruntime.quantization.preprocess --input mobilenetv2-7.onnx --output mobilenetv2-7-infer.onnx
Unfortunately, onnxruntime.quantization.preprocess
is not available inside Python with the API. Which means we will need to update the documentation on how to quantize a model to run commands in bash. This seems very strange to me, as this makes it also much harder to write scripts that automatically quantize some models.
It's available as onnxruntime.quantization.quant_pre_process()
.
We can use audonnx.Function
to include any custom function as a transform to a model.
The downside at the moment is that we loose some information on what the feature extractor might do, e.g.
Maybe we could improve here by also showing the name of the function. Let's assume we used a function called mfcc
, we could change to
transform: audonnx.core.function.Function(mfcc)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.