audeering / audonnx Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 0.0 11.2 MB

Deploy models in ONNX format

Home Page: https://audeering.github.io/audonnx/

License: Other

Python 100.00%

machine-learning onnx

audonnx's People

Contributors

Stargazers

Watchers

audonnx's Issues

audonnx not working with onnxruntime-gpu==1.10.0

I recently tried running an audonnx model on the GPU, and installed the latest (v1.10.0) version for onnxruntime-gpu. Unfortunately model loading crashed with:

ValueError: This ORT build has ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers par
ameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'], ...)

The issue was resolved after downgrading to v1.9.0. Judging by the error, I guess it's a simple instantiation problem of InferenceSession.

Add support for numpy 2.0

At the moment you cannot use audonnx with numpy 2.0 as onnxruntime does not support it, see microsoft/onnxruntime#21063.

no option of directml as execution provider

i cannot find any option of DMLExecutionProvider in ort.py i tried to change CPUExecutionProvider with DMLExecutionProvider but it didnt work as it kills kernal.

Kindly provide a way to run model with DMLExecutionProvider so i can run it on my AMD Rx 570 GPU !

Output node shape: replace 'time' with -1

As the following simple example shows we currently only replace 'time' in the shape of input nodes, but should do the same with output nodes:

import torch
import audonnx


class TorchModel(torch.nn.Module):

    def __init__(
        self,
    ):
        super().__init__()

    def forward(self, x: torch.Tensor):
        return x


torch_model = TorchModel()

input = torch.randn(1, 100, 4)  # batch, time, feature
output = torch_model(input)

onnx_path = 'model.onnx'
torch.onnx.export(
    torch_model,
    input,
    onnx_path,
    input_names=['input'],
    output_names=['output'],
    dynamic_axes={
        'input': {1: 'time'},
        'output': {1: 'time'},
    },
    opset_version=12,
)

audonnx.Model(onnx_path)

Input:
  output:
    shape: [1, -1, 4]
    dtype: tensor(float)
    transform: None
Output:
  output:
    shape: [1, time, 4]   # should be [1, -1, 4]
    dtype: tensor(float)
    labels: [output-0, output-1, output-2, output-3]

onnx 1.13.0 does not work with protobuf <=3.20.1

When trying to install the packages we get:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
399
onnx 1.13.0 requires protobuf<4,>=3.20.2, but you have protobuf 3.20.1 which is incompatible.

The problem is that we specify

protobuf <=3.20.1  # avoid TypeError: Descriptors cannot not be created directly

in docs/requirements.txt.
We need to check how to make this work for newer versions of protobuf, for now I will restrict onnx to <1.13.0 in docs/requriements.txt in #47.

Limit number of used threads when model is run on CPU

When loading a model on the CPU and running it, onnxruntime will use as many resources as it can get.
This can be avoided by specifying intra_op_num_threads and inter_op_num_threads.

Maybe we can tackle both by adding an num_workers argument to audonnx.load() and audonnx.Model as we have in other packages?
It seems also reasonable to set the default to 1.

Provide access to sess_options in audonnx.Model and audonnx.load()

In #81 we added the possibility to set number of workers by using the sess_options argument of onnxruntime.InferenceSession. As it supports more settings, from which we at least want to use enable_cpu_mem_arena, it might be a good idea to provide an argument to set those options in audonnx.Model and audonnx.load().

We need to decide if a onnxruntime_sess_options argument should then be overwrite num_workers or the other way around.

Add method to audonnx.Model to collect input/output labels

As discussed in #34 (comment) it might be good to have a method to collect a list of labels corresponding to specific output nodes.

This could replace

outputs = ['gender', 'confidence']
labels = sum([model.outputs[o].labels for o in outputs],  [])

with

outputs = ['gender', 'confidence']
labels = model.labels(outputs=['gender', 'confidence'])

Allow multiple inputs

Currently audonnx.Model only takes a signal as input.
https://github.com/audeering/audonnx/blob/main/audonnx/core/model.py#L198-L209

For some models, however, it might be necessary to provide additional information.
E.g. in the case of an LSTM model, we would like to provide the last hidden state and the memory state as additional input.

This is currently impossible as all additional inputs are derived directly from the signal.

It would be great if we could provide an option to enable additional model inputs along side the signal.

Ignore dynamic dimensions when assigning labels

Currently we do not support assigning labels if the last dimension is dynamic:

class TorchModel(torch.nn.Module):

    def __init__(
        self,
    ):
        super().__init__()

    def forward(self, x: torch.Tensor):
        return torch.cat([x, x])


torch_model = TorchModel()

input = torch.randn((1, 10))  # batch, time, feature
output = torch_model(input)
print(output)

onnx_path = 'model.onnx'
torch.onnx.export(
    torch_model,
    input,
    onnx_path,
    input_names=['input'],
    output_names=['output'],
    dynamic_axes={
        'input': {1: 'time'},
        'output': {1: 'time'},
    },
    opset_version=12,
)

onnx_model = audonnx.Model(
    onnx_path,
    labels=['left', 'right'],
)

ValueError: Cannot assign 2 labels to output 'output' with dimension -1.

Since we usually expect time last in our packages (e.g. audinterface), we should support this and ignore dynamic axis when assigning the labels.

Missing onnx requirement?

When using the new version of audonnx in a CI pipeline I get:

Node description

It might be useful to have the option to add some description to a node. This could be used to store additional information like value range, frame length, etc.

Add support for Python 3.11

Pull request is created at #54

This does not work yet, as onnxruntime has no support for Python 3.11 yet, see microsoft/onnxruntime#13482

Usage example for audonnx.Function

See #23 (comment)

Add support for IO bindings

As described in https://onnxruntime.ai/docs/api/python/api_summary.html#data-on-device one can use the IO bindings to move data to a CUDA device, which most likely makes execution faster.
We should test if this is the case and if yes, should integrate it in audonnx.

Add audonnx.device_to_providers()

Currently we have the hidden function audonnx.core.model._device_to_providers() which is useful when creating providers for an onxruntime.InferenceSession. I would propose to simply make it part of the official API under audonnx.device_to_providers().

Store win/hop duration for outputs that return frames

When inspecting a loaded model it is possible to have a dimension that has no fixed size, e.g.

  cnn_features:
    shape: [1, 512, -1]
    dtype: tensor(float)
    labels: [cnn_features-0, cnn_features-1, cnn_features-2, (...), cnn_features-509,
      cnn_features-510, cnn_features-511]

In order to be able to use such a model with audinterface one needs to know the hop duration (and window duration) of the output, so it would be nice if this would be stored as part of the model metadata.

TST: make it easier to create test models

In the unit tests we currently create torch models that we then convert to ONNX. Instead we can use onnx.helper to directly create the ONNX graphs. This will make it easier to create and test models on the fly.

Make tests under Windows work

Currently we don't test for Windows as we get the following two errors:

The model output can be slightly different under Windows

Storing the model to a path fails with an error in relpath under Windows

Add squeeze argument to Model call

As discussed in #34 (comment) it would be useful to add a squeeze argument to audonnx.Model.__call__() as this would allow to simplify adding interfaces to it, e.g.

interface = audinterface.Process(
    process_func=process_func,
    process_func_args={'squeeze': True},
)

instead of

def process_func(signal, sampling_rate):
    return model(signal, sampling_rate)[0][0]

interface = audinterface.Process(process_func=process_func)

The expected behavior of squeeze could be something like this:

[[0]] -> 0
[[0, 1]] -> [0, 1]

Add a section on reproducibility to the docs

The results you get back when running a model can depend on the device, and can even vary across several calls on the same device. It might be a good idea to add a "Reproducibility" section to the documentation in which we discuss these issues.

For example, let us use the model introduced in w2v2-how-to:

import audeer
import audonnx
import numpy as np


url = 'https://zenodo.org/record/6221127/files/w2v2-L-robust-12.6bc4a7fd-1.1.0.zip'
cache_root = audeer.mkdir('cache')
model_root = audeer.mkdir('model')

archive_path = audeer.download_url(url, cache_root, verbose=True)
audeer.extract_archive(archive_path, model_root)

np.random.seed(1)
sampling_rate = 16000
signal = np.random.normal(size=sampling_rate).astype(np.float32)

Now, let us execute the model on the CPU:

>>> model = audonnx.load(model_root, device='cpu')
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6832043 , 0.64673305, 0.49750742]], dtype=float32)

When using the CPU we always get back the same result,
when executing it multiple times.

Then let's switch to the GPU:

>>> model = audonnx.load(model_root, device='cuda:0')
>>> model(signal, sampling_rate)['logits']
array([[0.68319285, 0.64667934, 0.49738473]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.68317926, 0.6466613 , 0.4974225 ]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.683162  , 0.64668435, 0.4973961 ]], dtype=float32)

We see that we get different results after the fifth decimal place for each run,
and the average result deviates from the CPU based result by:

array([[-2.62856483e-05, -5.79953194e-05, -1.06304884e-04]], dtype=float32)

This is a known ONNX limitation (microsoft/onnxruntime#9704).
In microsoft/onnxruntime#4611 (comment) they propose to select a fixed convolution algorithm to improve this behavior, see also https://pytorch.org/docs/stable/notes/randomness.html#cuda-convolution-benchmarking.
With audonnx we can achieve this by

>>> providers = [("CUDAExecutionProvider", {'cudnn_conv_algo_search': 'DEFAULT'})]
>>> model = audonnx.load(model_root, device=providers)
>>> model(signal, sampling_rate)['logits']
array([[0.683191  , 0.64670646, 0.4973919 ]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6830938 , 0.6466217 , 0.49734592]], dtype=float32)
>>> model(signal, sampling_rate)['logits']
array([[0.6831656 , 0.64666504, 0.497427  ]], dtype=float32)

It does not really improve results.

It seems that we can only recommend the following when reproducibility is desired:

use CPU as device
limit the outcome of the model to two decimal places, e.g. array([[0.68, 0.65, 0.50]], dtype=float32)

/cc @audeerington

Warning with onnxruntime==1.13.1

When using onnxruntime==1.13.1 I see the following when loading a model with audonnx:

2022-12-20 12:35:32.566080649 [W:onnxruntime:, session_state.cc:1030 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not hav
e an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.        
2022-12-20 12:35:32.566108698 [W:onnxruntime:, session_state.cc:1032 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

Quantization should be used with preprocessing

With the newest version of onnxruntime (1.16.0) we are getting the following warning when building the docs:

WARNING: Cell printed to stderr:
WARNING:root:Please consider to run pre-processing before quantization. Refer to example: https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/image_classification/cpu/ReadMe.md

They suggest to first pre-process the model, e.g.

$ python -m onnxruntime.quantization.preprocess --input mobilenetv2-7.onnx --output mobilenetv2-7-infer.onnx

Unfortunately, onnxruntime.quantization.preprocess is not available inside Python with the API. Which means we will need to update the documentation on how to quantize a model to run commands in bash. This seems very strange to me, as this makes it also much harder to write scripts that automatically quantize some models.

It's available as onnxruntime.quantization.quant_pre_process().

Change the string representation of audonnx.Function

We can use audonnx.Function to include any custom function as a transform to a model.
The downside at the moment is that we loose some information on what the feature extractor might do, e.g.

Maybe we could improve here by also showing the name of the function. Let's assume we used a function called mfcc, we could change to

transform: audonnx.core.function.Function(mfcc)

Rename "output_names" argument

I'm always having troubles to remember "output_names" in Model.__call__(), and I also find it a bit long:

So I wonder if we should deprecate it and find a name that is easier to remember. Maybe "nodes"? Or "outputs" to fit Model.outputs?

audeering / audonnx Goto Github PK

audonnx's People

Contributors

Stargazers

Watchers

audonnx's Issues

Recommend Projects

Recommend Topics

Recommend Org