jw3126 / onnxruntime.jl Goto Github PK

Julia bindings for onnxruntime

License: MIT License

Julia 100.00%

juila onnx onnxruntime

onnxruntime.jl's Introduction

ONNXRunTime

ONNXRunTime provides inofficial julia bindings for onnxruntime. It exposes both a low level interface, that mirrors the official C-API, as well as an high level interface.

Contributions are welcome.

Usage

The high level API works as follows:

julia> import ONNXRunTime as ORT

julia> path = ORT.testdatapath("increment2x3.onnx"); # path to a toy model

julia> model = ORT.load_inference(path);

julia> input = Dict("input" => randn(Float32,2,3))
Dict{String, Matrix{Float32}} with 1 entry:
  "input" => [1.68127 1.18192 -0.474021; -1.13518 1.02199 2.75168]

julia> model(input)
Dict{String, Matrix{Float32}} with 1 entry:
  "output" => [2.68127 2.18192 0.525979; -0.135185 2.02199 3.75168]

For GPU usage the CUDA and cuDNN packages are required and the CUDA runtime needs to be set to 11.8 or a later 11.x version. To set this up, do

pkg> add CUDA cuDNN

julia> import CUDA

julia> CUDA.set_runtime_version!(v"11.8")

Then GPU inference is simply

julia> import CUDA, cuDNN

julia> ORT.load_inference(path, execution_provider=:cuda)

CUDA provider options can be specified

julia> ORT.load_inference(path, execution_provider=:cuda,
                          provider_options=(;cudnn_conv_algo_search=:HEURISTIC))

Memory allocated by a model is eventually automatically released after it goes out of scope, when the model object is deleted by the garbage collector. It can also be immediately released with release(model).

The low level API mirrors the offical C-API. The above example looks like this:

using ONNXRunTime.CAPI
using ONNXRunTime: testdatapath

api = GetApi();
env = CreateEnv(api, name="myenv");
so = CreateSessionOptions(api);
path = testdatapath("increment2x3.onnx");
session = CreateSession(api, env, path, so);
mem = CreateCpuMemoryInfo(api);
input_array = randn(Float32, 2,3)
input_tensor = CreateTensorWithDataAsOrtValue(api, mem, vec(input_array), size(input_array));
run_options = CreateRunOptions(api);
input_names = ["input"];
output_names = ["output"];
inputs = [input_tensor];
outputs = Run(api, session, run_options, input_names, inputs, output_names);
output_tensor = only(outputs);
output_array = GetTensorMutableData(api, output_tensor);

Alternatives

Use the onnxruntime python bindings via PyCall.jl.
ONNX.jl
ONNXNaiveNASflux.jl

Breaking Changes in version 0.4.

Support for CUDA.jl is changed from version 3 to versions 4 and 5.
Support for Julia versions less than 1.9 is dropped. The reason for this is to switch the conditional support of GPUs from being based on the Requires package to being a package extension. As a consequence the ONNXRunTime GPU support can now be precompiled and the CUDA.jl versions can be properly controlled via Compat.

Setting the CUDA Runtime Version in Tests

For GPU tests using ONNXRunTime, naturally the tests must depend on and import CUDA and cuDNN. Additionally a supported CUDA runtime version needs to be used, which can be somewhat tricky to set up for the tests.

First some background. What CUDA.set_runtime_version!(v"11.8") effectively does is to

Add a LocalPreferences.toml file containing

[CUDA_Runtime_jll]
version = "11.8"

In Project.toml, add

[extras]
CUDA_Runtime_jll = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"

If your test environment is defined by a test target in the top Project.toml you need to

Add a LocalPreferences.toml in your top directory with the same contents as above.
Add CUDA_Runtime_jll to the extras section of Project.toml.
Add CUDA_Runtime_jll to the test target of Project.toml.

If your test environment is defined by a Project.toml in the test directory, you instead need to

Add a test/LocalPreferences.toml file with the same contents as above.
Add CUDA_Runtime_jll to the extras section of test/Project.toml.

onnxruntime.jl's People

Stargazers

Watchers

Forkers

gbaraldi moelf stemann rssdev10 screenhandsaw ordicker gavinchen1314 ramstorage gunnarfarneback

onnxruntime.jl's Issues

multi-thread friendly?

julia> @time let input = Dict("dense_input" => randn(Float32, 1, 78))
           Threads.@threads for _ = 1:10000
               model(input)
           end
       end
ERROR: TaskFailedException
Stacktrace:
 [1] wait
   @ ./task.jl:334 [inlined]
 [2] threading_run(func::Function)
   @ Base.Threads ./threadingconstructs.jl:38
 [3] macro expansion
   @ ./threadingconstructs.jl:97 [inlined]
 [4] macro expansion
   @ ./REPL[9]:2 [inlined]
 [5] top-level scope
   @ ./timing.jl:220 [inlined]
 [6] top-level scope
   @ ./REPL[9]:0

    nested task error: ArgumentError: array must be non-empty
    Stacktrace:
     [1] pop!

Loading existing model test data?

The examples show generating random input data. What if I've already got an onnx model that comes with a test_data_set_0/ directory which contains input_0.pb and output_0.pb files? How do I go about loading those as the inputs/outputs?

Error: could not load library ... onnxruntime.dll ... The specified module could not be found.

julia 1.7 + ONNXRunTime v0.3.0
The .dll is definitely there. I deleted my .julia folder and did a fresh install with only ONNXRunTime.jl and Images.jl installed. I have had the code running in wsl, so it appears to be a windows issue. Please let me know if I can provide any more info.

ERROR: could not load library "C:\Users\AlexC\.julia\artifacts\a16d01a5b64dd0f036283f2d89b242e69a0ee03c\onnxruntime-win-x64-1.9.0\lib\onnxruntime.dll"
The specified module could not be found. 
Stacktrace:
  [1] dlopen(s::String, flags::UInt32; throw_error::Bool)
    @ Base.Libc.Libdl .\libdl.jl:117
  [2] dlopen (repeats 2 times)
    @ .\libdl.jl:117 [inlined]
  [3] set_lib!(path::String, execution_provider::Symbol)
    @ ONNXRunTime.CAPI C:\Users\AlexC\.julia\packages\ONNXRunTime\Ad1yw\src\capi.jl:43
  [4] make_lib!(execution_provider::Symbol)
    @ ONNXRunTime.CAPI C:\Users\AlexC\.julia\packages\ONNXRunTime\Ad1yw\src\capi.jl:78
  [5] libptr
    @ C:\Users\AlexC\.julia\packages\ONNXRunTime\Ad1yw\src\capi.jl:95 [inlined]
  [6] OrtGetApiBase(; execution_provider::Symbol)
    @ ONNXRunTime.CAPI C:\Users\AlexC\.julia\packages\ONNXRunTime\Ad1yw\src\capi.jl:312
  [7] #GetApi#2
    @ C:\Users\AlexC\.julia\packages\ONNXRunTime\Ad1yw\src\capi.jl:330 [inlined]
  [8] load_inference(path::String; execution_provider::Symbol, envname::String)
    @ ONNXRunTime C:\Users\AlexC\.julia\packages\ONNXRunTime\Ad1yw\src\highlevel.jl:63
  [9] load_inference(path::String)
    @ ONNXRunTime C:\Users\AlexC\.julia\packages\ONNXRunTime\Ad1yw\src\highlevel.jl:63
 [10] top-level scope
    @ c:\Users\AlexC\workspace\BarTracking.jl\BarTracking.jl:4

My code was only:

using ONNXRunTime, Images
model = ONNXRunTime.load_inference("model.onnx")

If it is model related my model is here:
https://github.com/cluffa/bar_tracking/blob/main/BarTracking/timm-regnetx_002_model.onnx

osx support

Shound not be too hard. Add osx binaries to Artifacts.toml and change this path

Incorrect results for matrix multiplication

I may be just doing something stupid, but it looks like many operations - including Gemm/MatMul - started to produce incorrect results. Here's a reproducer. First, we create model with a single MatMul operation (I use Python here to avoid my own mistakes, but ONNX.jl essentially generates the same graph):

import torch
from torch import nn


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()

    def forward(self, A, B):
        y = torch.matmul(B, A)   # reverse order since Julia does it this way
        return y


model = Net()
args = (torch.ones(1, 9),)

import numpy as np

pyA = np.array([[1, 4.],
                [2., 5],
                [3., 6]])
pyB = np.array([
                [7., 9, 11],
                [8., 10, 12]])
args = (torch.tensor(pyA), torch.tensor(pyB))
model(*args)
torch.onnx.export(model, args, "simple_net.onnx")

Running the model on these args gives the following result:

model(*args)
# tensor([[ 58., 139.],
#         [ 64., 154.]], dtype=torch.float64)

Python's onnxruntime produces the same output:

import onnxruntime as ort

ort_sess = ort.InferenceSession('simple_net.onnx')
ort_sess.run(None, {"onnx::MatMul_0": pyA, "onnx::MatMul_1": pyB})
[array([[ 58., 139.],
        [ 64., 154.]])]

But if I load it in Julia, the result is different:

import ONNXRunTime as OX

function ort_run(path, ort_args...)
    model = OX.load_inference(path)
    ort_inputs = Dict([OX.input_names(model)[i] => ort_args[i] for i=1:length(ort_args)])
    return model(ort_inputs)
end

oA = [1.0 4.0; 2.0 5.0; 3.0 6.0]
oB = [7.0 9.0 11.0; 8.0 10.0 12.0]

ort_run(expanduser("~/Downloads/simple_net.onnx"), oA, oB)["2"]
# 2×2 Matrix{Float64}:
#   76.0  103.0
#  100.0  136.0

I discovered this while porting ONNX.jl to ProtoBuf v1 in FluxML/ONNX.jl#74 , where multiple tests were failing, but also confirmed on master in Github Actions: current test (failing), 3 month old test (passing).

I don't see any changes in this repo fresher than 4 months, and the tests in ONNXRunTime itself pass, but is there a chance something has changed in published versions or artifacts that could result in this behavior?

Can't load simple model (with 8bit and 16bit inputs)

Hi nice package!

I'm trying to add new ops to ONNX.jl, and I use this package to test if the onnx file is valid (loadable and return the right results).
I'm using Onnx backend test suit and I think where is a bug on this package, but I'm not sure.

Here is a minimal working example:
That's a simple model for the "test_min_int16" model.zip. Basically, it's min(x,y) graph where x,y are INT16.

import ONNXRunTime as OX
OX.load_inference("model.onnx") #<test_dir>/data/node/test_min_int16/model.onnx

And I get this:

ERROR: Could not find an implementation for Min(13) node with name ''

I think that should work.

For "test_min_int32" is it working. (the whole test suit)

windows support

Shound not be too hard. Add official windows binaries to Artifacts.toml and change this path

Mac M1 support

Hello, is any way to update binary dependencies https://github.com/jw3126/ONNXRunTime.jl/blob/main/Artifacts.toml
by https://github.com/microsoft/onnxruntime/releases/tag/v1.13.1 ?

For Mac M1 there is a universal archive onnxruntime-osx-universal2-1.13.1.tgz with both arm64 and amd64 platforms.

Incompatibility with cuDNN 1.3.1

The recently released cuDNN 1.3.1 bumps the CUDNN_jll dependency to 9.0, which provides libcudnn.so.9. This is not compatible with the ONNXRunTime CUDA extension, which needs libcudnn.so.8 to be loaded before loading libonnxruntime_providers_cuda.so. Things might still work if libcudnn.so.8 happens to be found in system libraries but that's obviously not something we want to depend on.

For future releases (until we can be compatible with CUDNN 9) we can upper bound cuDNN with

cuDNN = "~1.1, ~1.2, =1.3.0"

but I suspect we also had better retroactively cap the dependency for existing versions (0.4 and up) in the General registry. I know how to prepare a registry PR to that effect if it sounds good.

Manual release of memory

ONNXRunTime objects release their memory through a finalizer, which calls the CAPI.release function. As noted in the docstring, garbage collection should normally handle this automatically. However if you have a large GPU model you might have extreme GPU memory pressure without any CPU memory pressure at all, and the garbage collection won't be in any hurry to destroy your objects, which would release your GPU memory.

Obviously CAPI.release gives you the tool to handle this manually but it's quite clunky both to dip down into the C API, and to separately extract the api and session fields for the release call.

My proposal is to add a highlevel release function/method for InferenceSession. Thoughts?

Contribution: JLL for support for additional platforms

I've worked on getting a BinaryBuilder set-up for building onnxruntime for all BB-supported platforms: https://github.com/IHPSystems/onnxruntime_jll_builder

The current master only builds for CPU.

Need to figure out how to deploy both CPU-only and the CUDA-dependent libraries (e.g. as two jll's), but there is a WIP branch: https://github.com/IHPSystems/onnxruntime_jll_builder/tree/feature/cuda

My main aim has been to get ONNX Runtime with TensorRT support on Nvidia Jetson (aarch64), but an automated deployment of those binaries will likely require some form of re-packaging of other binaries - which is why the build_tarballs.jl script is in its own repo. and not in Yggdrasil (yet).

Privacy - tracking - data collection

One should also be aware that the currently provided pre-built Microsoft-binaries may include some level tracking: https://github.com/microsoft/onnxruntime/blob/main/docs/Privacy.md

Should be fixed by using the JLL artifacts, e.g. #19

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

CUDA 4.0 compatibility

cudnn was moved out of CUDA.jl

https://github.com/JuliaGPU/CUDA.jl/tree/master/lib/cudnn

So we need CUDNN_jll or cuDNN as an additional optional dependency.