ucb-bar / onnxruntime-riscv Goto Github PK

Fork of upstream onnxruntime focused on supporting risc-v accelerators

License: MIT License

Batchfile 0.07% Shell 0.41% CMake 1.35% C# 5.94% C++ 65.19% C 5.26% Objective-C 0.36% PowerShell 0.01% Python 11.09% Cuda 2.42% Assembly 3.09% Java 0.84% PureBasic 0.01% HLSL 0.03% Jupyter Notebook 0.90% Pascal 0.01% JavaScript 0.06% TypeScript 2.78% Objective-C++ 0.15% Roff 0.05%

onnxruntime-riscv's People

Stargazers

Watchers

onnxruntime-riscv's Issues

Prediction of `images/dog.jpg` with `resnet50_opt_quant.onnx`

Describe the bug
I setup the chipyard and gemmini following https://github.com/ucb-bar/gemmini/blob/master/README.md, and the onnxruntime-riscv following current repo. When I run spike --extension=gemmini pk ort_test -m resnet50_opt_quant.onnx -i images/dog.jpg -p caffe2 -x 2 -O 99, I got prediction like:

0.045356 window screen
0.067687 tray
0.068786 switch, electric switch, electrical switch
0.145288 stopwatch, stop watch
0.182146 cup

However, the expected prediction should be like:

0.031456 giant schnauzer
0.075702 curly-coated retriever
0.087432 Great Dane
0.271946 Labrador retriever
0.361813 Rottweiler

as mentioned in the tutorial. Similar prediction mismatch has also been shown in #90.

Meanwhile, I tried -x 0 in the spike command above. For resnet50, -x 0 and -x 2 result in different prediction, and -x 0 matches the result shown in tutorial, which gives Rottweiler. For googlenet, the two different options result in the same prediction, which is:

0.010029 Newfoundland, Newfoundland dog
0.017131 curly-coated retriever
0.083233 Great Dane
0.288698 flat-coated retriever
0.548024 Labrador retriever

in my run.

Could u please help figure out why?
Thanks!

(all models downloaded from https://github.com/ucb-bar/onnxruntime-riscv/releases)

Urgency
hopefully asap

System information

OS Platform and Distribution: Linux Ubuntu 18.04:
ONNX Runtime installed from: onnxruntime-riscv
ONNX Runtime version: 2021-12-23
Python version: 3.6.9
Visual Studio version (if applicable): None
GCC/Compiler version (if compiling from source): 7.4.0
CUDA/cuDNN version: None
GPU model and memory: None

To Reproduce

Describe steps/code to reproduce the behavior.
Attach the ONNX model to the issue (where applicable) to expedite investigation.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

problem when I run ./build.sh --parallel

Describe the bug
A clear and concise description of what the bug is.

Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
ONNX Runtime installed from (source or binary):
ONNX Runtime version:
Python version:
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:

To Reproduce

Describe steps/code to reproduce the behavior.
Attach the ONNX model to the issue (where applicable) to expedite investigation.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

Support maxpool/relu fuse in fp32

We only currently do it in int8, but it should be trivial to support fp32 as well. (Reason why we haven't bothered with fp32 yet is that training doesn't make use of maxpool fusion, and in training we usually have conv -> BN -> relu which can't be fused)

TODO: Port over commit a1db87b and fe0b2b2ab

Port over the safeint check and the performance improvements in commit a1db87b and fe0b2b2

Error building the repo

Describe the bug
I am trying to follo the Gemmini IISWC tutorial to learn how to use the onnx runtime with Gemmini.
I was executing the script gemmini/scripts/build-onnx-inference.sh that gives an error when executing the build on the onnx-runtime repo.
The build command is as follows:
./build.sh --for_firesim --parallel --enable_training --config=Debug --cmake_extra_defines onnxruntime_USE_SYSTOLIC=ON onnxruntime_SYSTOLIC_INT8=ON onnxruntime_SYSTOLIC_FP32=OFF

I just added the --for_firesim option because I am runnning it on firesim.
And it gives an error, there are two errors in fact one on 6% and on 15%.

The first error is about flake8 not being able to do something. I had installed all the requirements*.txt on /onnx-runtime and I found that flake8 needs importlib-metadata<4.3. So I installed 4.2 version, but it gave an error/warning for sphinx needing importlib-metadata>=4.4. But I just ignored this sphinx error because I think my problems are with flake8.

I paste the log here of the build run here.

-- Build files have been written to: /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/build/Debug
2022-03-30 11:39:42,529 util.run [DEBUG] - Subprocess completed. Return code: 0
2022-03-30 11:39:42,529 build [INFO] - Building targets for Debug configuration
2022-03-30 11:39:42,530 util.run [INFO] - Running subprocess in '/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv'
['/usr/local/bin/cmake', '--build', 'build/Debug', '--config', 'Debug', '--', '-j16']
[ 0%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/allocation_description.proto
[ 0%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/versions.proto
[ 0%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/attr_value.proto
[ 0%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/api_def.proto
[ 0%] Generating cpp/platform/posix/src/per_thread_waiter.c
[ 0%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/cluster.proto
[ 0%] Building CXX object CMakeFiles/onnxruntime_providers_shared.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/shared/common.cc.o
[ 0%] Building CXX object external/flatbuffers/CMakeFiles/flatbuffers.dir/src/idl_parser.cpp.o
[ 0%] Running gen_proto.py on onnx/onnx.in.proto
Checking python scripts for PEP8 conformance using flake8
[ 0%] Building CXX object external/googletest/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
[ 0%] Building CXX object CMakeFiles/custom_op_library.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/test/testdata/custom_op_library/custom_op_library.cc.o
[ 0%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/platform.cpp.o
[ 0%] Generating cpp/internal/common.c
[ 0%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/any_lite.cc.o
[ 0%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/bitstate.cc.o
[ 0%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_parser.cpp.o
[ 0%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/arena.cc.o
[ 0%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/arenastring.cc.o
[ 0%] Generating cpp/internal/counter.c
[ 0%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/extension_set.cc.o
[ 0%] Generating cpp/internal/cv.c
[ 0%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/threading.cpp.o
[ 0%] Linking CXX shared library libonnxruntime_providers_shared.so
[ 0%] Generating cpp/internal/debug.c
[ 0%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/config.proto
[ 1%] Generating cpp/internal/dll.c
[ 1%] Generating cpp/internal/mu.c
Processing /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/onnx/onnx/onnx.in.proto
Writing /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/build/Debug/external/onnx/onnx/onnx-ml.proto
[ 1%] Generating cpp/internal/mu_wait.c
Writing /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/build/Debug/external/onnx/onnx/onnx-ml.proto3
generating /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/build/Debug/external/onnx/onnx/onnx_pb.py
[ 1%] Generating cpp/internal/note.c
[ 1%] Running C++ protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/build/Debug/external/onnx/onnx/onnx-ml.proto
[ 1%] Generating cpp/internal/once.c
[ 1%] Built target onnxruntime_providers_shared
[ 1%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/generated_enum_util.cc.o
[ 1%] Generating cpp/internal/sem_wait.c
[ 1%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/cost_graph.proto
[ 1%] Generating cpp/internal/time_internal.c
[ 1%] Generating cpp/internal/wait.c
[ 1%] Generating cpp/platform/c++11/src/nsync_panic.cc
[ 1%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/cpp_shape_inference.proto
[ 1%] Generating cpp/platform/c++11/src/time_rep_timespec.cc
[ 1%] Generating cpp/platform/c++11/src/yield.cc
[ 1%] Generating cpp/platform/linux/src/nsync_semaphore_futex.c
[ 1%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/debug.proto
[ 2%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/internal/common.c.o
[ 2%] Built target gen_onnx_proto
[ 2%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/event.proto
[ 2%] Building CXX object external/flatbuffers/CMakeFiles/flatbuffers.dir/src/idl_gen_text.cpp.o
[ 2%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/function.proto
[ 2%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/graph.proto
[ 2%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/meta_graph.proto
[ 3%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/node_def.proto
[ 3%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/op_def.proto
[ 3%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/resource_handle.proto
[ 3%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/rewriter_config.proto
[ 3%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/saved_object_graph.proto
[ 3%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/saver.proto
[ 3%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/step_stats.proto
[ 3%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/struct.proto
[ 3%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/summary.proto
[ 3%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/tensor.proto
[ 3%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/tensor_description.proto
[ 3%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/tensor_shape.proto
[ 4%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/tfprof_log.proto
[ 4%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/trackable_object_graph.proto
[ 4%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/types.proto
[ 4%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/internal/counter.c.o
[ 4%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/variable.proto
[ 4%] Running cpp protocol buffer compiler on /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/cmake/external/tensorboard/tensorboard/compat/proto/verifier_config.proto
[ 4%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/allocation_description.pb.cc.o
[ 4%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/generated_message_table_driven_lite.cc.o
[ 4%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/generated_message_util.cc.o
[ 4%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/sgemm.cpp.o
[ 4%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/qgemm.cpp.o
[ 4%] Linking CXX shared module libcustom_op_library.so
[ 4%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/internal/cv.c.o
[ 6%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/implicit_weak_message.cc.o
[ 6%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/compile.cc.o
[ 6%] Built target custom_op_library
[ 6%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/io/coded_stream.cc.o
[ 6%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/io/io_win32.cc.o
[flake8 PEP8 ERROR] /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/tools/ci_build/build.py:1716:1: E302 expected 2 blank lines, found 1
[flake8 PEP8 ERROR] /home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/tools/ci_build/build.py:1719:1: E302 expected 2 blank lines, found 1
[ 6%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/io/strtod.cc.o
CMakeFiles/pep8_check.dir/build.make:82: recipe for target 'CMakeFiles/pep8_check' failed
gmake[2]: *** [CMakeFiles/pep8_check] Error 1
CMakeFiles/Makefile2:1959: recipe for target 'CMakeFiles/pep8_check.dir/all' failed
gmake[1]: *** [CMakeFiles/pep8_check.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
[ 6%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/qdwconv.cpp.o
[ 6%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/internal/debug.c.o
[ 6%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/io/zero_copy_stream.cc.o
[ 6%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/internal/dll.c.o
[ 6%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/api_def.pb.cc.o
[ 6%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/io/zero_copy_stream_impl.cc.o
[ 6%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/io/zero_copy_stream_impl_lite.cc.o
[ 6%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/convolve.cpp.o
[ 6%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/pooling.cpp.o
[ 6%] Building CXX object external/flatbuffers/CMakeFiles/flatbuffers.dir/src/reflection.cpp.o
[ 6%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/internal/mu.c.o
[ 6%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/map.cc.o
[ 6%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/message_lite.cc.o
[ 6%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/dfa.cc.o
[ 6%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/parse_context.cc.o
[ 6%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/repeated_field.cc.o
[ 6%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/internal/mu_wait.c.o
[ 6%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/stubs/bytestream.cc.o
[ 6%] Building CXX object external/flatbuffers/CMakeFiles/flatbuffers.dir/src/util.cpp.o
[ 6%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/stubs/common.cc.o
[ 6%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/transpose.cpp.o
[ 6%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/internal/note.c.o
[ 6%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/reorder.cpp.o
[ 7%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/stubs/int128.cc.o
[ 7%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/stubs/status.cc.o
[ 7%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_text.cpp.o
[ 7%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/internal/once.c.o
[ 7%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/stubs/statusor.cc.o
[ 7%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/stubs/stringpiece.cc.o
[ 7%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/stubs/stringprintf.cc.o
[ 7%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/stubs/structurally_valid.cc.o
[ 7%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/stubs/strutil.cc.o
[ 7%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/snchwc.cpp.o
[ 7%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/internal/sem_wait.c.o
[ 7%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir//src/google/protobuf/stubs/time.cc.o
[ 7%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/reflection.cpp.o
[ 7%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/attr_value.pb.cc.o
[ 8%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/activate.cpp.o
[ 8%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/filtered_re2.cc.o
[ 8%] Building CXX object external/protobuf/cmake/CMakeFiles/libprotobuf-lite.dir/__/src/google/protobuf/wire_format_lite.cc.o
[ 8%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/internal/time_internal.c.o
[ 8%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/internal/wait.c.o
[ 8%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/platform/linux/src/nsync_semaphore_futex.c.o
[ 9%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/platform/posix/src/per_thread_waiter.c.o
[ 9%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/platform/c++11/src/yield.cc.o
[ 9%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/platform/c++11/src/time_rep_timespec.cc.o
[ 9%] Building CXX object external/nsync/CMakeFiles/nsync_cpp.dir/cpp/platform/c++11/src/nsync_panic.cc.o
[ 9%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/mimics_pcre.cc.o
[ 9%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/logistic.cpp.o
[ 9%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/util.cpp.o
[ 9%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/cluster.pb.cc.o
[ 9%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/tanh.cpp.o
[ 9%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/nfa.cc.o
[ 9%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/erf.cpp.o
[ 9%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/compute.cpp.o
[ 9%] Linking CXX static library libnsync_cpp.a
[ 9%] Built target nsync_cpp
[ 9%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/quantize.cpp.o
[ 9%] Linking CXX static library libprotobuf-lited.a
[ 9%] Built target libprotobuf-lite
[ 9%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/qladd.cpp.o
[ 9%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/onepass.cc.o
[ 9%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_cpp.cpp.o
[ 9%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/qlmul.cpp.o
[ 10%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_csharp.cpp.o
[ 10%] Linking CXX static library ../../../lib/libgtestd.a
[ 10%] Built target gtest
[ 10%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_dart.cpp.o
[ 10%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/qpostprocessor.cpp.o
[ 10%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/config.pb.cc.o
[ 10%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/qlgavgpool.cpp.o
[ 10%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/cost_graph.pb.cc.o
[ 10%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/parse.cc.o
[ 10%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/power/SgemmKernelPower.cpp.o
[ 10%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/cpp_shape_inference.pb.cc.o
[ 10%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/perl_groups.cc.o
[ 12%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/debug.pb.cc.o
[ 12%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/mlas/lib/systolic/systolic.cpp.o
[ 12%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/prefilter.cc.o
[ 12%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/prefilter_tree.cc.o
[ 13%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/prog.cc.o
[ 13%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/event.pb.cc.o
[ 13%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/re2.cc.o
[ 13%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/regexp.cc.o
[ 13%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/set.cc.o
[ 13%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/function.pb.cc.o
[ 13%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/graph.pb.cc.o
[ 13%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/meta_graph.pb.cc.o
[ 13%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/simplify.cc.o
[ 13%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_kotlin.cpp.o
[ 13%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/stringpiece.cc.o
[ 13%] Linking CXX static library libonnxruntime_mlas.a
[ 13%] Built target onnxruntime_mlas
[ 13%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/tostring.cc.o
[ 13%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/unicode_casefold.cc.o
[ 13%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_go.cpp.o
[ 13%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/node_def.pb.cc.o
[ 13%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/op_def.pb.cc.o
[ 13%] Building CXX object external/re2/CMakeFiles/re2.dir/re2/unicode_groups.cc.o
[ 13%] Building CXX object external/re2/CMakeFiles/re2.dir/util/rune.cc.o
[ 13%] Building CXX object external/re2/CMakeFiles/re2.dir/util/strutil.cc.o
[ 13%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/resource_handle.pb.cc.o
[ 13%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/rewriter_config.pb.cc.o
[ 13%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/saved_object_graph.pb.cc.o
[ 13%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_java.cpp.o
[ 13%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_js_ts.cpp.o
[ 13%] Linking CXX static library libre2.a
[ 13%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/saver.pb.cc.o
[ 13%] Built target re2
[ 13%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/step_stats.pb.cc.o
[ 13%] Linking CXX static library libflatbuffers.a
[ 13%] Built target flatbuffers
[ 13%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/struct.pb.cc.o
[ 13%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_php.cpp.o
[ 14%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/summary.pb.cc.o
[ 14%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/tensor.pb.cc.o
[ 14%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/tensor_description.pb.cc.o
[ 14%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/tensor_shape.pb.cc.o
[ 14%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/tfprof_log.pb.cc.o
[ 14%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_python.cpp.o
[ 14%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/trackable_object_graph.pb.cc.o
[ 14%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/types.pb.cc.o
[ 14%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_lobster.cpp.o
[ 14%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_lua.cpp.o
[ 14%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/variable.pb.cc.o
[ 14%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_rust.cpp.o
[ 14%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/verifier_config.pb.cc.o
[ 14%] Building CXX object tensorboard/CMakeFiles/tensorboard.dir/tensorboard/compat/proto/versions.pb.cc.o
[ 14%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_fbs.cpp.o
[ 14%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_grpc.cpp.o
[ 15%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_json_schema.cpp.o
[ 15%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_swift.cpp.o
[ 15%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/flatc.cpp.o
[ 15%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/flatc_main.cpp.o
[ 15%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/src/code_generators.cpp.o
[ 15%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/grpc/src/compiler/cpp_generator.cc.o
[ 15%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/grpc/src/compiler/go_generator.cc.o
[ 15%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/grpc/src/compiler/java_generator.cc.o
[ 15%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/grpc/src/compiler/python_generator.cc.o
[ 15%] Building CXX object external/flatbuffers/CMakeFiles/flatc.dir/grpc/src/compiler/swift_generator.cc.o
[ 15%] Linking CXX executable flatc
[ 15%] Linking CXX static library libtensorboard.a
[ 15%] Built target flatc
[ 15%] Built target tensorboard
Makefile:165: recipe for target 'all' failed
gmake: *** [all] Error 2
Traceback (most recent call last):
File "/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/tools/ci_build/build.py", line 2160, in
sys.exit(main())
File "/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/tools/ci_build/build.py", line 2086, in main
build_targets(args, cmake_path, build_dir, configs, num_parallel_jobs, args.target)
File "/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/tools/ci_build/build.py", line 1087, in build_targets
run_subprocess(cmd_args, env=env)
File "/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/tools/ci_build/build.py", line 591, in run_subprocess
return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
File "/home/centos/firesim-copy/target-design/chipyard/generators/gemmini/software/onnxruntime-riscv/tools/python/util/run.py", line 44, in run
env=env, shell=shell)
File "/usr/lib64/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/local/bin/cmake', '--build', 'build/Debug', '--config', 'Debug', '--', '-j16']' returned non-zero exit status 2.
Building against Debug
Building with mlockall for running on Firesim
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnx_test_runner_common.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnxruntime_test_utils.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnxruntime_training_runner.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnxruntime_training.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnxruntime_session.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnxruntime_optimizer.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnxruntime_providers.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnxruntime_util.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnxruntime_framework.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnxruntime_util.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnxruntime_graph.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnxruntime_providers_systolic.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnxruntime_common.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnxruntime_flatbuffers.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/libonnx_test_data_proto.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/external/onnx/libonnx.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Debug/external/onnx/libonnx_proto.a: No such file or directory
Makefile:11: recipe for target 'ort_test' failed
make: *** [ort_test] Error 1

Urgency
High

System information

OS Platform and Distribution : Centos Linux 7.9.2009

Expected behavior
The build complete correctly.

error while building onnxruntime using cross-compiilation

Describe the bug
[ 72%] Linking CXX static library libonnxruntime_providers.a
[ 72%] Built target onnxruntime_providers
Scanning dependencies of target onnx_test_runner
Scanning dependencies of target onnxruntime
[ 72%] Building C object CMakeFiles/onnxruntime.dir/generated_source.c.o
Scanning dependencies of target onnxruntime_test_all
[ 72%] Building CXX object CMakeFiles/onnx_test_runner.dir/home/onnxruntime-riscv/onnxruntime/test/onnx/main.cc.o
In file included from /home/onnxruntime-riscv/build/Release/generated_source.c:1:
/home/onnxruntime-riscv/include/onnxruntime/core/providers/systolic/systolic_provider_factory.h:14:137: error: expected ';', ',' or ')' before '=' token
14 | ORT_API_STATUS(OrtSessionOptionsAppendExecutionProvider_Systolic, In OrtSessionOptions* options, int use_arena, char accelerator_mode = 0)
| ^
/home/onnxruntime-riscv/include/onnxruntime/core/session/onnxruntime_c_api.h:168:76: note: in definition of macro 'ORT_API_STATUS'
168 | ORT_EXPORT Check_return Ret_maybenull OrtStatusPtr ORT_API_CALL NAME(VA_ARGS) NO_EXCEPTION ORT_MUST_USE_RESULT
| ^~~~~~~~~~~
/home/onnxruntime-riscv/build/Release/generated_source.c: In function 'GetFunctionEntryByName':
/home/onnxruntime-riscv/build/Release/generated_source.c:7:89: error: 'OrtSessionOptionsAppendExecutionProvider_Systolic' undeclared (first use in this function); did you mean 'OrtSessionOptionsAppendExecutionProvider_CPU'?
7 | if(strcmp(name,"OrtSessionOptionsAppendExecutionProvider_Systolic") ==0) return (void*)&OrtSessionOptionsAppendExecutionProvider_Systolic;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| OrtSessionOptionsAppendExecutionProvider_CPU
/home/onnxruntime-riscv/build/Release/generated_source.c:7:89: note: each undeclared identifier is reported only once for each function it appears in
CMakeFiles/onnxruntime.dir/build.make:90: recipe for target 'CMakeFiles/onnxruntime.dir/generated_source.c.o' failed
make[2]: *** [CMakeFiles/onnxruntime.dir/generated_source.c.o] Error 1
CMakeFiles/Makefile2:724: recipe for target 'CMakeFiles/onnxruntime.dir/all' failed
make[1]: *** [CMakeFiles/onnxruntime.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Urgency
medium

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Ubuntu 18.04
ONNX Runtime installed from (source or binary):source
ONNX Runtime version:1.2.0
Python version:python-3.6.9
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source):9.2.0
CUDA/cuDNN version:
GPU model and memory:

To Reproduce
esp-gnu-toolchain for cross-compilation
./build.sh --config Release --build_shared_lib --parallel
Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

Cmake version error

I am trying to build the repo on firesim.
I am currently running the command ./build.sh --parallel --for_firesim, but it founds an error on the cmake version.
I have already updated it but it keeps showing the error even if it has no coherence:

CMake Error at CMakeLists.txt:6 (cmake_minimum_required):
CMake 3.13 or higher is required. You are running version 3.6.2

¿How can I fix this?

Thanks in advance.
Regards, Majo.

Add gemmini_flush(0); flag only-once in call to os/ws

TODO: Move the Signed qlinearconv/matmul tests to the systolic provider test folder

Not calling Gemmini after fusing QLinearConv and QLinearRelu

Hello,

I am using onnxruntime-riscv with branch 2021-05-12 and commit ID 7bbd049.

I found that if operators QLinearConv and QLinearRelu are separated, as the left side of the following figure shown, Gemmini systolic array will be used as expected. However, if they are fused, as the right side of the following figure shown, Gemmini systolic array will NOT be used.

Here are the two models: using_systolic_or_not.zip

I don't know this is a desired scheduling, or something wrong?
Could you help?

Thank you very much!

May I get some help on compilation error?

When I was compiling with ./build.sh in top folder (onnxruntime-tiscv), I got this assembler error message for systolic_include.h:

Error: unknown pseudo-op: `.insn'

Could you help me to get the compilation through?
Thanks a lot.

The onnxruntime-riscv source code is from 2021-01-18 branch and with commit id 85e2cdf.

The esp-tools is from the default master branch and with the latest commit id dcb6012f77101e793948cc90ac31b3735a9f3f6d.
With this esp-tools, I can run gemmini-rocc-tests successfully.

Other system information is summarized below (copied from the output of ./build.sh --parallel).

CMake version : 3.18.0
CMake command : /usr/local/bin/cmake
System : Linux
C++ compiler : /root/Documents/202101/esp-tools-install/bin/riscv64-unknown-linux-gnu-g++
C++ compiler version : 7.2.0
CXX flags : -march=rv64imafdc -mabi=lp64d -Wno-error=attributes -Wnon-virtual-dtor
Build type : Debug
Compile definitions : ENABLE_ORT_FORMAT_LOAD;EIGEN_MPL2_ONLY;USE_EIGEN_FOR_BLAS
CMAKE_PREFIX_PATH :
CMAKE_INSTALL_PREFIX : /usr/local
CMAKE_MODULE_PATH : /home/xxxx/Documents/202102/prj_onnx_riscv/onnxruntime-riscv/cmake/external

ONNX version : 1.8.0
ONNX NAMESPACE : onnx
ONNX_BUILD_TESTS : OFF
ONNX_BUILD_BENCHMARKS : OFF
ONNX_USE_LITE_PROTO : ON
ONNXIFI_DUMMY_BACKEND : OFF
ONNXIFI_ENABLE_EXT : OFF

Protobuf compiler :
Protobuf includes :
Protobuf libraries :
BUILD_ONNX_PYTHON : OFF

Looking for strtof_l
Looking for strtof_l - found
Looking for strtoull_l
Looking for strtoull_l - found
Performing Test HAS_UNUSED_BUT_SET_VARIABLE
Performing Test HAS_UNUSED_BUT_SET_VARIABLE - Success
Performing Test HAS_UNUSED_PARAMETER
Performing Test HAS_UNUSED_PARAMETER - Success
Performing Test HAS_UNUSED_VARIABLE
Performing Test HAS_UNUSED_VARIABLE - Success
Performing Test HAS_CAST_FUNCTION_TYPE
Performing Test HAS_CAST_FUNCTION_TYPE - Failed
Performing Test HAS_PARENTHESES
Performing Test HAS_PARENTHESES - Success
Performing Test HAS_USELESS_CAST
Performing Test HAS_USELESS_CAST - Success
Performing Test HAS_NONNULL_COMPARE
Performing Test HAS_NONNULL_COMPARE - Success
Performing Test HAS_TAUTOLOGICAL_POINTER_COMPARE
Performing Test HAS_TAUTOLOGICAL_POINTER_COMPARE - Failed
Performing Test HAS_CATCH_VALUE
Performing Test HAS_CATCH_VALUE - Failed
Performing Test HAS_MISSING_BRACES
Performing Test HAS_MISSING_BRACES - Success
Performing Test HAS_IGNORED_ATTRIBUTES
Performing Test HAS_IGNORED_ATTRIBUTES - Success
Performing Test HAS_DEPRECATED_COPY
Performing Test HAS_DEPRECATED_COPY - Failed
Performing Test HAS_DEPRECATED_DECLARATIONS
Performing Test HAS_DEPRECATED_DECLARATIONS - Success
Performing Test HAS_CLASS_MEMACCESS
Performing Test HAS_CLASS_MEMACCESS - Failed
Performing Test HAS_MAYBE_UNINITIALIZED
Performing Test HAS_MAYBE_UNINITIALIZED - Success
Looking for clock_gettime in rt
Looking for clock_gettime in rt - found

CMake Warning at flake8.cmake:19 (message):
Could not find 'flake8' to check python scripts. Please install flake8
using pip.
Call Stack (most recent call first):
CMakeLists.txt:1521 (include)

Configuring done
Generating done
CMake Warning:
Manually-specified variables were not used by the project:

onnxruntime_BUILD_WINML_TESTS
onnxruntime_CUDA_HOME
onnxruntime_DNNL_GPU_RUNTIME
onnxruntime_DNNL_OPENCL_ROOT
onnxruntime_MIGRAPHX_HOME
onnxruntime_PYBIND_EXPORT_OPSCHEMA
onnxruntime_ROCM_HOME
onnxruntime_TENSORRT_HOME

Build files have been written to: /home/xxxx/Documents/202102/prj_onnx_riscv/onnxruntime-riscv/build/Debug
2021-02-09 21:33:32,660 util.run [DEBUG] - Subprocess completed. Return code: 0
2021-02-09 21:33:32,660 build [INFO] - Building targets for Debug configuration
2021-02-09 21:33:32,661 util.run [INFO] - Running subprocess in '/home/xxxx/Documents/202102/prj_onnx_riscv/onnxruntime-riscv'
['/usr/local/bin/cmake', '--build', 'build/Debug', '--config', 'Debug', '--', '-j4']

Refactoring: Use the inbuilt im2col NHWC?

More of a code cleanliness thing, but currently for im2col in nhwc layout we use our own method that I took from the pytorch docs. ONNXRuntime also includes its own im2col nhwc implementation (see the cpu math ops). The two functions are basically identical except in our own im2col_nhwc function it does it for all groups whereas the ORT one only does it for one group. So when you refactor you'll have to move the im2col inside the for-loop where we iterate over the groups.

Assertion failed in gemmini_t::setmode()

Describe the bug
Using the CPU simulation of Gemmini, I can run the example model successfully as

[root@localhost imagenet_runner]# spike --extension=gemmini pk ./ort_test -m googlenet_quantized.onnx -i images/cat.jpg -p caffe2 -x 0 -O 0    
... ...
Element count 1000. Top 5 classes:
0.005367 tiger, Panthera tigris
0.007940 lynx, catamount
0.185520 Egyptian cat
0.186814 tiger cat
0.608402 tabby, tabby cat
Done! Inference took 66829139079 cycles

However, when I use OS mode of Gemmini, there's an assertion failure in gemmini_t::setmode()

[root@localhost imagenet_runner]# spike --extension=gemmini pk ./ort_test -m googlenet_quantized.onnx -i images/dog.jpg -p caffe2 -x 1 -O 0
Gemmini extension configured with:
    dim = 16
bbl loader
Loaded runner program
Using systolic in mode 1
Using Onnxruntime C++ API
Number of inputs = 1
Input 0 : name=data_0, type=1, num_dims=4: [1, 3, 224, 224, ]
Number of outputs = 1
Output 0 : name=prob_1, type=1, num_dims=2: [1, 1000, ]
Loading image
Image dimensions: 224 224 3
First few image values 130.061005 126.060997 123.060997
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 12544, 147)
spike: ../gemmini/gemmini.cc:265: void gemmini_t::setmode(reg_t, reg_t): Assertion `new_acc_shift >= 0 && new_acc_shift < sizeof(acc_t)*8' failed.

May I have your help in this issue?
Thanks a lot.

Additional context
I have checked the consistency between
systolic_params_int8.h under /root/Documents/202102/prj_onnx_riscv/onnxruntime-riscv/onnxruntime/core/mlas/lib/systolic
and
gemmini_params.h under riscv-isa-sim/gemmini.

Refactoring: Use the recently added Relu int8 type

Currently we emit our own QLinearRelu type in the quantizer and do fusing based off of this. This was done to work around a previous deficiency in ONNX operators that Relu did not support int8. I recently submitted a PR to add Relu support which was recently merged in: onnx/onnx#3141

We can now emit and use this type.

mnist_training user store segfault in mvout

Describe the bug
Hi! 😀
I tried mnist training in systolic_runner/mnist_training, but I got user store segfault in gemmini_extended_mvout.

I executed spike --extension=gemmini pk mnist_train --model_name mnist_conv_w_batchnorm.onnx --train_data_dir mnist/mnist_data/ --num_train_steps 10 -x 1.
I downloaded train_data from microsoft/onnxruntime#3706 (comment).
When I execute with -x 0 option (using only CPU), I could successfully train this model.
I got onnx model by using my_create_mnist_w_batchnorm.py which is slightly different from create_mnist_w_batchnorm.py. (The onnx model from the original version script made an error because of the difference between the dimension of train data and the input of the model.) Below is my patch.

--- create_mnist_w_batchnorm.py	2021-12-27 06:59:56.445299810 +0000
+++ my_create_mnist_w_batchnorm.py	2022-01-10 17:25:47.868769954 +0000
@@ -24,6 +24,8 @@
 
 batch_size = -1
 
+Inputshape = helper.make_tensor(name="Inputshape", data_type=onnx.TensorProto.INT64, dims=[4], vals=[batch_size, 1, 28, 28])
+
 W1_dims = [8, 1, 5, 5]
 W2_dims = [16, 8, 5, 5]
 W3_dims = [256, 10]
@@ -50,7 +52,8 @@
 
 node0 = helper.make_node('BatchNormalization', inputs=['T1', 's', 'bias', 'mean', 'var'], outputs=['T1_bn'])
 
-node1 = helper.make_node('Conv', inputs=['X', 'W1', 'B1'], outputs=['T1'], kernel_shape=[5,5], strides=[1,1], pads=[2,2,2,2])
+reshapenode = helper.make_node("Reshape", inputs=["X", "Inputshape"], outputs=["X1"])
+node1 = helper.make_node('Conv', inputs=['X1', 'W1', 'B1'], outputs=['T1'], kernel_shape=[5,5], strides=[1,1], pads=[2,2,2,2])
 node2 = helper.make_node('Relu', inputs=['T1_bn'], outputs=['T2'])
 node3 = helper.make_node('MaxPool', inputs=['T2'], outputs=['T3'], kernel_shape=[2,2], strides=[2,2])
 
@@ -63,13 +66,13 @@
 node8 = helper.make_node('Gemm', inputs=['T7', 'W3', 'B3'], outputs=['predictions'])
 
 graph = helper.make_graph(
-    [node1, node0, node2, node3, node4, node5, node6, node7, node8],
+    [reshapenode, node1, node0, node2, node3, node4, node5, node6, node7, node8],
     'mnist_conv',
     [ helper.make_tensor_value_info('s', TensorProto.FLOAT, ([8])),
      helper.make_tensor_value_info('bias', TensorProto.FLOAT, ([8])),
      helper.make_tensor_value_info('mean', TensorProto.FLOAT, ([8])),
     helper.make_tensor_value_info('var', TensorProto.FLOAT, ([8])),
-     helper.make_tensor_value_info('X', TensorProto.FLOAT, ([batch_size, 1, 28, 28])),
+     helper.make_tensor_value_info('X', TensorProto.FLOAT, ([batch_size, 784])),
      helper.make_tensor_value_info('W1', TensorProto.FLOAT, W1_dims),
      helper.make_tensor_value_info('W2', TensorProto.FLOAT, W2_dims),
      helper.make_tensor_value_info('W3', TensorProto.FLOAT, W3_dims),
@@ -77,9 +80,10 @@
      helper.make_tensor_value_info('B2', TensorProto.FLOAT, B2_dims),
      helper.make_tensor_value_info('B3', TensorProto.FLOAT, B3_dims),
      helper.make_tensor_value_info('shape', TensorProto.INT64, [2]),
+     helper.make_tensor_value_info('Inputshape', TensorProto.INT64, [4]),
     ],
     [helper.make_tensor_value_info('predictions', TensorProto.FLOAT, ([batch_size, 10]))],
-    [s, bias, mean, var, W1, W2, W3, B1, B2, B3, shape]
+    [s, bias, mean, var, W1, W2, W3, B1, B2, B3, shape, Inputshape]
 )
 original_model = helper.make_model(graph, producer_name='onnx-examples')

Below is the error message including my own printf for debug. I used FP32.

Gemmini extension configured with:
    dim = 16
bbl loader
Loaded runner program
Setting up logger
Setting up env
Setting up training params
Setting up data
Loading MNIST data from folder mnist/mnist_data/
Preparing data ...
Preparing data: done
#training set size = 60000 
#test set size = 10000 
Creating training runner
Initializing training runner
1970-01-01 00:00:12.989272862 [W:onnxruntime:, graph.cc:1074 Graph] Initializer s appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
1970-01-01 00:00:12.989337471 [W:onnxruntime:, graph.cc:1074 Graph] Initializer bias appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
1970-01-01 00:00:12.989387912 [W:onnxruntime:, graph.cc:1074 Graph] Initializer mean appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
1970-01-01 00:00:12.989436124 [W:onnxruntime:, graph.cc:1074 Graph] Initializer var appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
1970-01-01 00:00:12.989486858 [W:onnxruntime:, graph.cc:1074 Graph] Initializer W1 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
1970-01-01 00:00:12.989540649 [W:onnxruntime:, graph.cc:1074 Graph] Initializer W2 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
1970-01-01 00:00:12.989590741 [W:onnxruntime:, graph.cc:1074 Graph] Initializer W3 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
1970-01-01 00:00:12.989639219 [W:onnxruntime:, graph.cc:1074 Graph] Initializer B1 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
1970-01-01 00:00:12.989687503 [W:onnxruntime:, graph.cc:1074 Graph] Initializer B2 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
1970-01-01 00:00:12.989736087 [W:onnxruntime:, graph.cc:1074 Graph] Initializer B3 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
1970-01-01 00:00:12.989784748 [W:onnxruntime:, graph.cc:1074 Graph] Initializer shape appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
1970-01-01 00:00:12.989836510 [W:onnxruntime:, graph.cc:1074 Graph] Initializer Xshape appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
1970-01-01 00:00:13.730676103 [W:onnxruntime:, graph.cc:84 MergeShapeInfo] Error merging shape info for output. 'loss' source:{} target:{1}. Falling back to lenient merge.
Starting training
>>>> my_debug gemmini_extended_config_st act: 0, scale : 1.000000, stride_C : 784, sizeof_C : 4
>>>> my debug C_dram_addr : 0x263f5040, C_sp_addr : c0000000, rows : 8, cols : 8
z  0000000000000000 ra 0000000000da65ec sp 0000003fffffa830 gp 0000000001c2a4f8
tp 0000000001c9b500 t0 000000000000000a t1 0000000000002000 t2 0000000000000001
s0 0000003fffffaa00 s1 0000000000000002 a0 000000000000005c a1 0000000000000000
a2 000000000000005c a3 0000000000000000 a4 00080008c0000000 a5 00000000263f5040
a6 0000000000000064 a7 0000000000000040 s2 0000000000000310 s3 0000000000000019
s4 0000000000000000 s5 0000000000000002 s6 0000000000000001 s7 0000000020403280
s8 0000000000000001 s9 000000000000001c sA 000000000000001c sB 0000000000000005
t3 0000000000000000 t4 0000000000000008 t5 0000000000000000 t6 0000000000000020
pc 0000000000da6606 va/inst 3f800000263f5c80 sr 8000000200006020
User store segfault @ 0x3f800000263f5c80

System information

OS Platform and Distribution : Linux Ubuntu 18.04
ONNX Runtime installed from (source or binary):
ONNX Runtime version: 0c8c9b4
Python version: 3.6.9
GCC/Compiler version (if compiling from source): 9.2.0

To Reproduce

Following https://github.com/ucb-bar/onnxruntime-riscv/blob/2021-12-23/systolic_runner/docs/BUILD.md.
Rebuild spike and pk separately.
Execute ./build.sh --enable_training in mnist_training directory.
Get an onnx model from https://drive.google.com/file/d/1eoXn0-xC7nQYfnAOHKcZJf0TTXb_0YaG/view?usp=sharing or executing create_mnist_w_batchnorm.py after applying above patch.
Execute spike --extension=gemmini pk mnist_train --model_name mnist_conv_w_batchnorm.onnx --train_data_dir mnist/mnist_data/ --num_train_steps 10 -x 1.

EDIT : I also checked that pc (0000000000da6606) in user store segfault message comes from gemmini_extended_mvout() in systolic_include.h:sp_tiled_matmul_os.

Error building the repository

Hello,
I am trying to follow the Gemmini ONNX tutorial.
I just run gemmini/scripts/build-onnx-inference.sh that executes the build.sh script on onnxruntime-riscv.
I modified a bit the flags and the one used are the following:

./build.sh --config=Release --cmake_extra_defines onnxruntime_USE_SYSTOLIC=ON onnxruntime_SYSTOLIC_INT8=ON onnxruntime_SYSTOLIC_FP32=OFF

That execution throw an error. The output of the build is next:

[ 34%] Linking CXX static library libonnxruntime_graph.a
[ 34%] Built target onnxruntime_graph
Scanning dependencies of target onnxruntime_providers_systolic
[ 34%] Building CXX object CMakeFiles/onnxruntime_providers_systolic.dir/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/fusion/qlinearadd_relu_fuse.cc.o
[ 34%] Building CXX object CMakeFiles/onnxruntime_providers_systolic.dir/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/fusion/qlinearconv_relu_fuse.cc.o
[ 35%] Building CXX object CMakeFiles/onnxruntime_providers_systolic.dir/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/math/gemm.cc.o
[ 35%] Building CXX object CMakeFiles/onnxruntime_providers_systolic.dir/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/math/matmul.cc.o
[ 35%] Building CXX object CMakeFiles/onnxruntime_providers_systolic.dir/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/math/quantize_linear_matmul.cc.o
[ 35%] Building CXX object CMakeFiles/onnxruntime_providers_systolic.dir/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/nn/conv.cc.o
[ 35%] Building CXX object CMakeFiles/onnxruntime_providers_systolic.dir/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/nn/maxpool.cc.o
[ 35%] Building CXX object CMakeFiles/onnxruntime_providers_systolic.dir/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/nn/qlinearconv.cc.o
/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/nn/qlinearconv.cc: In member function 'virtual onnxruntime::common::Status onnxruntime::systolic::QLinearConv_nhwc::Compute(onnxruntime::OpKernelContext*) const':
/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/nn/qlinearconv.cc:218:29: error: 'class onnxruntime::profiling::Profiler' has no member named 'StartTime'; did you mean 'GetStartTimeNs'?
  218 |       start_time = profiler.StartTime();
      |                             ^~~~~~~~~
      |                             GetStartTimeNs
/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/nn/qlinearconv.cc:252:31: error: 'class onnxruntime::profiling::Profiler' has no member named 'StartTime'; did you mean 'GetStartTimeNs'?
  252 |         start_time = profiler.StartTime();
      |                               ^~~~~~~~~
      |                               GetStartTimeNs
/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/nn/qlinearconv.cc:283:31: error: 'class onnxruntime::profiling::Profiler' has no member named 'StartTime'; did you mean 'GetStartTimeNs'?
  283 |         start_time = profiler.StartTime();
      |                               ^~~~~~~~~
      |                               GetStartTimeNs
/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/nn/qlinearconv.cc: In member function 'virtual onnxruntime::common::Status onnxruntime::systolic::QLinearConv::Compute(onnxruntime::OpKernelContext*) const':
/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/nn/qlinearconv.cc:458:31: error: 'class onnxruntime::profiling::Profiler' has no member named 'StartTime'; did you mean 'GetStartTimeNs'?
  458 |         start_time = profiler.StartTime();
      |                               ^~~~~~~~~
      |                               GetStartTimeNs
/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/nn/qlinearconv.cc:502:33: error: 'class onnxruntime::profiling::Profiler' has no member named 'StartTime'; did you mean 'GetStartTimeNs'?
  502 |           start_time = profiler.StartTime();
      |                                 ^~~~~~~~~
      |                                 GetStartTimeNs
/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/nn/qlinearconv.cc:530:31: error: 'class onnxruntime::profiling::Profiler' has no member named 'StartTime'; did you mean 'GetStartTimeNs'?
  530 |         start_time = profiler.StartTime();
      |                               ^~~~~~~~~
      |                               GetStartTimeNs
make[2]: *** [CMakeFiles/onnxruntime_providers_systolic.dir/build.make:154: CMakeFiles/onnxruntime_providers_systolic.dir/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/providers/systolic/nn/qlinearconv.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1265: CMakeFiles/onnxruntime_providers_systolic.dir/all] Error 2
make: *** [Makefile:163: all] Error 2
Traceback (most recent call last):
  File "/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/tools/ci_build/build.py", line 2160, in <module>
    sys.exit(main())
  File "/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/tools/ci_build/build.py", line 2086, in main
    build_targets(args, cmake_path, build_dir, configs, num_parallel_jobs, args.target)
  File "/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/tools/ci_build/build.py", line 1087, in build_targets
    run_subprocess(cmd_args, env=env)
  File "/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/tools/ci_build/build.py", line 591, in run_subprocess
    return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
  File "/home/user/Documents/chipyard/generators/gemmini/software/onnxruntime-riscv/tools/python/util/run.py", line 41, in run
    completed_process = subprocess.run(
  File "/home/user/anaconda3/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/cmake', '--build', 'build/Release', '--config', 'Release']' returned non-zero exit status 2.
Building against Release
riscv64-unknown-linux-gnu-g++: error: ../..//build/Release/libonnxruntime_test_utils.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Release/libonnxruntime_session.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Release/libonnxruntime_providers.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Release/libonnxruntime_util.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Release/libonnxruntime_util.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Release/libonnxruntime_providers_systolic.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Release/libonnxruntime_flatbuffers.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Release/external/re2/libre2.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Release/external/onnx/libonnx.a: No such file or directory
riscv64-unknown-linux-gnu-g++: error: ../..//build/Release/external/re2/libre2.a: No such file or directory
make: *** [Makefile:11: ort_test] Error 1
Please ignore any dlopen warning above. Glibc hates being statically linked.

Urgency
High.

System information

Linux Ubuntu-20.04
Python version: 3.9.7
GCC/Compiler version (if compiling from source): riscv64-unknown-linux-gnu-* from the riscv-tools esp-tools branch
GPU model and memory: Gemmini
Chipyard: 1.6.2 (commit 481398b910fa95ec88dd578c67ba358a4d83129d)
ONNX-runtime-riscv: commit 7bbd049, origin/2021-05-12
Gemmini: commit c47cb7f3eb5c18390f176f3a53c43c8546d487d2

Expected behavior
Successful installation of the repository.

systolic_params.h file missing

Describe the bug
I am trying to generate a workload for multiple Gemmini configurations (with and without convs, etc).
I saw on the systolic_runner readme that I must have some systolic_params.h file that should match a gemmini_params.h file with the details of the Gemmini design I wish to run.

I have generated until now two different designs for Gemmini, specifying on gemmini/configs/GemminiCustomConfigs.scala the desired Gemmini config.

How do i know which configuration of Gemmini corresponds to the gemmini_params.h file I have right now? The last build? Then, I would have to generate the binary for the tests for each design right after generating the design so that the file does not change.

Also, I am missing that systolic_params.h although I built the onnxruntime-riscv with the flag --for_firesim.
How should I proceed?

Urgency
High. I am trying to write a paper with this asap.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Linux ip-192-168-5-162.ec2.internal 3.10.0-1160.66.1.el7.x86_64 #1 SMP Wed May 18 16:02:34 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.9.2009 (Core)
Release: 7.9.2009
Codename: Core
Firesim version: commit 8c5416c09808c34e674d30ee0d81fa07335d952d (HEAD, tag: 1.14.2)
Gemmini version: commit c47cb7f3eb5c18390f176f3a53c43c8546d487d2 (HEAD, tag: v0.6.3)
ONNX Runtime installed from: source
ONNX Runtime version: commit 7bbd049 (HEAD, origin/2021-05-12)

To Reproduce

I just installed firesim with the script provided.
Cloned this repo to gemmini/software.
Build this repo with the following command ./build.sh --parallel --for_firesim --config=Debug --cmake_extra_defines onnxruntime_USE_SYSTOLIC=ON onnxruntime_SYSTOLIC_INT8=OFF onnxruntime_SYSTOLIC_FP32=ON

Expected behavior
I need to generate a binary for every different configuration of Gemmini.

Gemmini BERT functionality issue

Describe the bug
BERT inference result of python_inference.py and Gemmini don't match. Gemmini seems to have functionality problem on BERT-base-cased.onnx

Urgency
I'd be happy if you could respond until 23rd, November, but I'll understand even if you couldn't

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04.3 w Docker image from iiswc-gemmini tutorial
ONNX Runtime installed from (source or binary): pip3 install onnxruntime
ONNX Runtime version: 1.9.0
Python version: 3.6
Visual Studio version (if applicable): none
GCC/Compiler version (if compiling from source): 7.5.0
CUDA/cuDNN version: none
GPU model and memory: none

To Reproduce

I pulled docker image from iiswc-gemmini tutorial document, and rebuilt everything mentioned in tutorial document
Got onnx model following README inside bert_mask_runner. optimize_model.py had nothing to do with the execution result
Since I had problem installing onnxruntime inside docker (This is another big issue), I executed python_inference.py at my local machine (system above). I ran python_inference.py on INPUT.py's last sentence and got ['law', 'treaties', 'agreements', 'arbitration', 'relations']
Then, I built ort_test and ran gemmini with spike --extension=gemmini pk ort_test -m onnx/bert-base-cased.onnx -x 2 -O 99 after running preprocess.py and got a output.
I ran postprocess.py and got garbage value [',', '.', 'the', '##s', '-']

Expected behavior
['law', 'treaties', 'agreements', 'arbitration', 'relations']

Screenshots
[',', '.', 'the', '##s', '-']

Additional context
I also tried INPUT.py's first sentence, but the postprocess.py failed to decode it.
Traceback (most recent call last):
File "postprocess.py", line 47, in
print([x["token_str"] for x in result])
UnicodeEncodeError: 'ascii' codec can't encode character '\uff1a' in position 4: ordinal not in range(128)

issue of runing onnx model

Describe the bug
Hello,
after patching the pk and spike, when I run the command "spike --extension=gemmini pk ort_test -m googlenet_quantized.onnx -i images/cat.jpg -p caffe2 -x 1 -O 0", the terminal throws out an error:

Gemmini extension configured with:
dim = 16
bbl loader
bad syscall #230!

System information

OS Platform and Distribution: Linux Ubuntu 18.04
ONNX Runtime installed from (source or binary):
ONNX Runtime version: commit 4298984
Python version: 3.6
GCC/Compiler version (if compiling from source): riscv64-unknown-elf-g++9.2.0

To Reproduce
1, I followed the intructions here: https://github.com/ucb-bar/onnxruntime-riscv/blob/2021-05-12/systolic_runner/docs/BUILD.md.
2, I rebuilt spike and pk separately.
3, I ran the "resnet50-baremetal" from "gemmini-rocc-tests" with the spike, and it worked. Thus I guess there is no issue with spike as it was able to recognize extension of ISA. I suppose the pk has problems, but I did exactly same as the insturctions from the above link. I tried your suggestions like rebuilding the esp_tools, but I got another issue that says "old kerner".

Could you kelp me out, thanks

Cannot run some network with spike

Describe the bug
When run onnx models here https://github.com/pranav-prakash/onnxruntime-riscv/releases/tag/v0.01, I got bad syscall #131!
I have tried googlenet_quantized.onnx, mobilenet_quantized_optimized.onnx and resnet50_quantized.onnx, only the resnet could run normally.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux CentOS 7
ONNX Runtime installed from (source or binary): source
Python version:Python: 2.7.5 Python3: 3.6.8

To Reproduce

git clone chipyard and esp-tools
follow the instruction in chipyard to install esp-tools in chipyard and follow the instruction in systolic_runner to patch riscv-pk in esp-tools and install spike & pk in esp-tools
change PATH to run the patched spike and pk
git clone this project
follow the instruction in systolic_runner to build onnx and ort_test
download googlenet_quantized.onnx
from https://github.com/pranav-prakash/onnxruntime-riscv/releases/tag/v0.01
run spike --extension=gemmini pk ort_test -m googlenet_quantized.onnx -i images/cat.jpg -p caffe2 -x 1 -O 0
Here is the result

Gemmini extension configured with:
    dim = 16
bbl loader
Loaded runner program
Using systolic in mode 1
Using Onnxruntime C++ API
terminate called after throwing an instance of 'Ort::Exception'
  what():  /home/zenk/onnxruntime-riscv/onnxruntime/core/session/inference_session.cc:245 onnxruntime::InferenceSession::InferenceSession(const onnxruntime::SessionOptions&, const onnxruntime::Environment&, const string&) status.IsOK() was false. Given model could not be parsed while creating inference session. Error message: Protobuf parsing failed.

bad syscall #131!

Expected behavior
Get some outputs.

Additional context
I also have some other questions. I want to get a network to do face verification. And I found https://github.com/onnx/models/tree/master/vision/body_analysis/arcface. But I tried run the onnx file directly, and it gives me the same bad syscall #131. I also tried to quantize this network by running python3 calibrate.py --model_path arcfaceresnet100-8.onnx --dataset_path ./ --output_model_path arcfaceresnet100-8_quantized.onnx --static=True --data_preprocess=mxnet --mode=int8. It also failed.
Here is the result

Traceback (most recent call last):
  File "calibrate.py", line 379, in <module>
    main()
  File "calibrate.py", line 348, in main
    args.data_preprocess)
  File "calibrate.py", line 289, in load_test_data
    preprocess_method)
  File "calibrate.py", line 261, in load_single_test_data
    'Number of input protobufs does not match expected model inputs')
ValueError: Number of input protobufs does not match expected model inputs

Why I cannot run some network, and why I get the error when try to quantize a network? How to fix them? Thanks for helping.

Don't require full rebuild when doing --for_firesim

Whenever we add on --for_firesim this forces a full rebuild. It's very annoying and slows down development speed. This should not be required because all for_firesim on the ORT binary does is basically flush gemmini at process start (the flag for the runner does mlockall, but since runner builds in 1 min anyway that's not an issue).

I propose instead of controller whether flush happens via an ifdef, we just always flush on the first call to gemmini if the execution mode is WS. Basically have a global variable that stores whether flush has happened, and then on the very first call to one of the gemmini functions (conv/matmul) we flush if mode is WS. The get_matmul_type function is a good place for this because all calls go through that.

TODO: Add the 1x1 conv case to NHWC layout as well

Take a look at https://github.com/pytorch/pytorch/blob/master/caffe2/operators/conv_op_impl.h and port that optimization

Query regarding RISCV-V ISA extension support

Does onnxruntime supports RISC-V Vector ISA extension? Also is it required to use gemmini to support riscv ISA?

RCNN Runner

Hey,

i tried to run the RCNN Runner from this repository, but it always fails for me.
I used the Mask rcnn model from the onnx model, because i could not quantize the rcnn model from the release with the given python files.
https://github.com/onnx/models/tree/main/vision/object_detection_segmentation/mask-rcnn

Then i run it with firemarshal in linux, with the given preprocessed image.
rcnn_runner -m mask_rcnn_R_50_FPN_1x_opt_quant.onnx -i images/preprocessed.jpg -x 2 -d 0
The preprocessing is executed correctly, but then i gets stuck at the beginning of the execution.
The last line on my output is:
[I:onnxruntime:, sequential_executor.cc:157 Execute] Begin execution
And then i simply get nothing more. No output, no error or anything else. The core is still running as i can stop the execution with ctrl+c and then execute something else.
I also tried the CPU mode with the same result.

Did you ever run the RCNN Runner?
Do you have an idea where the problem is?

Greetings
Raphael Klink

Failed to Build ort_runner with Firesim code added.

Describe the bug
When I added the code to "runner.cpp" to build for firesim as suggested in BUILD.md, the build will fail with the following error:

src/runner.cpp: In function 'int main(int, char**)':
src/runner.cpp:32:16: error: 'MCL_CURRENT' was not declared in this scope
   32 |   if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) {
      |                ^~~~~~~~~~~
src/runner.cpp:32:30: error: 'MCL_FUTURE' was not declared in this scope
   32 |   if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) {
      |                              ^~~~~~~~~~
src/runner.cpp:32:7: error: 'mlockall' was not declared in this scope
   32 |   if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) {
      |

Better argument validation for the runners

Minor thing: we need better arg validation for a lot of the runners. If you miss -p mxnet for instance you get a cryptic

terminate called after throwing an instance of 'std::domain_error'
  what():  No value

Add CPU int8 resize support

For mask-rcnn, performance is better if we do the resize in int8. See the mask-rcnn branch for an example of how to extend the CPU EP for this – I haven't merged it into the main branch because I usually prefer to upstream any CPU EP modifications.

But actually the implementation in our mask-rcnn branch always rounds-down, which I don't think is good behavior. Instead it should round-to-even. See issue onnx/onnx#3390

Runtime Error when running model with Dynamic input size.

Hi,

On one of my own model with one of the input size:
name: "input_ids" type { tensor_type { elem_type: 7 shape { dim { dim_param: "batch" } dim { dim_param: "sequence" } } } }

So the ONNX runtime says that my model input has a negative size:
Input 0 : name=input_ids, type=7, num_dims=2: [-1, -1, ]

And, I got the following error at runtime:

`terminate called after throwing an instance of 'Ort::Exception'
  what():  /home/entropy/Development/onnxruntime-riscv/onnxruntime/core/framework/tensor.cc:52 void onnxruntime::Tensor::Init(onnxruntime::MLDataType, const onnxruntime::TensorShape&, void*, onnxruntime::AllocatorPtr, ptrdiff_t) shape.Size() must >=0
Stacktrace:
[0x6926da]
[0x48756]
[0x4083a]
[0x487e8]
[0x40a28]
[0x11a5e]
[0x971258]
[0x12e14]

bad syscall #131!`

Is there a way that I can make the dynamic axes (dynamic size in the inputs and outputs) in my ONNX model to be a fixed value?

Thank you!

Supported Gemmini features

I'm wondering which features from gemmini v0.3 are currently support by the systolic provider.

Further i would like to know if there is an expected gemmini configuration by the provider, or if all possible variants are working. Always under the assumption, that i modify the systolic_params.h file to fit the current configuration.

Bump to quantizer from upstream

The quantizer we currently have (in systolic_runner/quantization) was forked long time ago and has since diverged significantly from upstream. Now this isn't really a bug per se since it works fine (and has been tested/tweaked on all the models we currently need). However, lagging so far behind means we don't get any of the new goodies they upstream like the new histogram-based range calculations (instead of just min/max) or any of the fancy things like the equalization algorithm.

In 5458ce9 (branch merge_upstream_quantizer) I've ported most of the changes I used for int8, and I tested with resnet50 and it works (for some reason the calculated scale factors are ever-so-slightly different. I'll try and find out why, but it's probably just a difference in formula used somewhere).

However, I'm hesitant to actually merge it into the main branch since this hasn't been used as much as the older implementation, so things will probably break. In particular, I'm worried about more esoteric models like BERT and mask-rcnn.

compare running time of the same network ( spike --extension=gemmini .onnx vs spike --extension=gemmini-baremel )

I successfully use onnxmltools to convert network and quantize network and use Gemmini for the matmul/convolution.
Now I decide to run gemmini on fgpa (with a risc-v processor).
Our team will use asic if the performance meets expectations.
You said the mobilenet and resnet50 examples in gemmini-rocc-tests were manually crafted after performing post-training quantization on the two networks.
I want to know running time of the same network (spike ***.onnx vs spike ***-baremel )
Because I am afraid running .onnx will cost to much time on cpu and cause the gemmini acceleration effect is not obvious .
And I don't want to write manually .c and .h file from different pre-trained network.
I hope everyone can use gemmini automatically as long as he has a pre-trained model on the computer.

Unable to build, getting error:libonnxruntime_providers.a(cpu_execution_provider.cc.o) -- riscv

I am building onnxruntime in RISCV architecture (U540 target).We getting error described bellow,

Describe the bug
Trying to setup onnxruntime on my os and I'm getting error: undefined reference to `onnxruntime::KernelCreateInfo onnxruntime::BuildKernelCreateInfoonnxruntime::kCpuExecutionProvider_ConcatFromSequence_kOnnxDomain_ver11()'

Urgency
Moderate. Needed for testing.

System information

OS Platform and Distribution (e.g., Linux):
ONNX Runtime installed from (source or binary):Source
ONNX Runtime version: 1.3.0
Python version: 3.7
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source):riscv64-unknown-linux-gnu-gcc/g++ version 8.3
CUDA/cuDNN version: No
GPU model and memory: No, CPU version

To Reproduce
git clone https://github.com/pranav-prakash/onnxruntime-riscvcd onnxruntime
git submodule update --init --recursive
mkdir build
cd build
cmake ../cmake/ -Donnxruntime_ENABLE_PYTHON=ON -DPYTHON_EXECUTABLE=/usr/bin/python3
make

Expected behavior
Successful build
Error:
/usr/lib/gcc/riscv64-oe-linux/8.3.0/../../../../riscv64-oe-linux/bin/ld: libonnxruntime_providers.a(cpu_execution_provider.cc.o):(.data.rel.ro._ZZN11onnxruntime27RegisterOnnxOperatorKernelsERNS_14KernelRegistryEE14function_table+0x990): undefined reference to `onnxruntime::KernelCreateInfo onnxruntime::BuildKernelCreateInfoonnxruntime::kCpuExecutionProvider_ConcatFromSequence_kOnnxDomain_ver11()'
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/onnxruntime_perf_test.dir/build.make:191: onnxruntime_perf_test] Error 1
make[1]: *** [CMakeFiles/Makefile2:213: CMakeFiles/onnxruntime_perf_test.dir/all] Error 2
make: *** [Makefile:141: all] Error 2

I am not understanding and Can please help to resolve this...

Thanks,

Training: support saving the model after trained in nhwc

After training, model is reloaded and weights are copied over by name. But as part of the NHWC transform we change the initializer name for conv, so these aren't copied over properly.

I'm not sure what's the cleanest way to handle this. I can think of two options:
Run nhwc transform on reloaded model, so the names match and when saved model will be in nhwc format
For names that are not found, check if an nhwc version of initializer is present. If so, undo hwio transform – this results in an nchw saved model.

problem when I run ./build.sh --parallel

Describe the bug
problem when I run ./build.sh --parallel

System information
[37%] running cpp protocol buffer compiler (lite) on ****/onnx-ml.proto
[36%] running cpp protocol buffer compiler (lite) on ****/onnx-operators-ml.proto
/bin/sh : ../external/protobuf/cmake/protoc-3.11.1.0 : cannot execute binary file : Exec format error
onnx/CMakeFiles/onnx_proto.dir/build.make : 62 recipe for target 'onnx/onnx-ml-pb.h' failed
/bin/sh : ../external/protobuf/cmake/protoc-3.11.1.0 : cannot execute binary file : Exec format error
make[2] : ***[onnx/onnx-ml.pb.h] Error 126
make[2] : ***Waiting for unfinished jobs....
onnx/CMakeFiles/onnx_proto.dir/build.make : 70 recipe for target 'onnx/onnx-operators-ml-pb.h' failed
make[1] : ***[onnx/CMakeFiles/onnx_proto.dir/all] Error 2
Makefile:140:recipe for target 'all' failed
make : *** [all] Error 2

Expected behavior
sucessful

Additional context
How can I solve this problem?

"Unsupport type" error in running an onnx model

Describe the bug
I am trying to use run onnx models on gemmini by spike. I have successfully run the inference on googlenet_quantized.onnx and mobilenet_quantized.onnx, but failed in [resnet50_opt_quant.onnx] and suggest 'Unsupport type' when 'Called into systolic add'.
In CmakeLists.txt, I set the fp32 off and int8 on.
What is the problem?
Urgency
none
System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Linux Ubuntu 18.04
ONNX Runtime installed from (source or binary): source
ONNX Runtime version: branch 2021-12-23
Python version:3.6.9
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:

To Reproduce

Describe steps/code to reproduce the behavior.
I change the CMakeList to fp32 off and int8 on, rebuild the ORT and ORT_TEST
spike --extension=gemmini pk ort_test -m resnet50_opt_quant.onnx -i images/cat.jpg -p caffe2 -x 1 -O 0
Attach the ONNX model to the issue (where applicable) to expedite investigation.
The released model :resnet50_opt_quant.onnx
Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

Additional context

Play around with QDQ Support

As mentioned in microsoft/onnxruntime#7033 ORT added support for converting QDQ internally into the quantized equivalent. Since this is done via graph transform, it should work for Systolic -- we would need to change the assigned EP though I think

As also mentioned with microsoft/onnxruntime#7144 this makes it easy to run quantize-aware-trained models. We can play around with this if we need more accuracy than post-training quantization can give us.

Size Overflow issue of running a model with multiple input nodes

Hi,

What Should I do If I want to run a model with multiple input nodes and multiple output nodes?

I tried:

auto memory_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
Ort::Value input_0_tensor = Ort::Value::CreateTensor<int64_t>(memory_info, input_0_val, input_0_size, input_node_dims[0].data(), input_node_dims[0].size());
Ort::Value input_1_tensor = Ort::Value::CreateTensor<int64_t>(memory_info, input_1_val, input_1_size, input_node_dims[1].data(), input_node_dims[1].size());

However, I got the following Error:

terminate called after throwing an instance of 'Ort::Exception'
  what():  size overflow
bad syscall #131!

Initially, I thought this caused by not enough memory assigned to the spike simulator. So later I tried:

spike -m81920 [some arguments to run]

However, the problem still exist after giving 80GBs of memory to spike.

What do you think might caused this problem? Should I consider increase the stack size in the PK?

Thank you!

Failing to run the calibrate

Hello,
I am trying to run an int8-quantized model with clip node. But after I run the calibrate.py by the code below, it seems that I get a brken quantized model.
python calibrate.py --model_path V4.onnx --dataset_path ../../pb --output_model_path model_quantized.onnx --static=True --data_preprocess=None --mode=int8
When I try to execute the systolic_runner example minimaly modified on spike, I keep getting the following error:
$_P4XK6Q@QNLM{B$VM6I@0$
Here comes the netron view of the brken quantized model. It seems that I lose all of my clip nodes.

Error executing FP32 ONNX model on Gemmini

Describe the bug
I am trying to execute the model MobilenetV2 a bit modified (just the final layer from 1000 to 13 labels) in FP32 with a the code from systolic_runner/imagenet_runner a little bit modified (dimensions and preprocess).
When I simulate the execution on spike with this command:

spike --extension=gemmini pk ort_test -m MobileNetV2_0p5_all.onnx -i images/peon_blanco_fondo_blanco.jpg -p mobilenet -x 2 -O 99

It gives the following error:

Gemmini extension configured with:
    dim = 16
bbl loader
Loaded runner program
Using systolic in mode 2
Using Onnxruntime C++ API
Number of inputs = 1
Input 0 : name=input, type=1, num_dims=4: [1, 224, 224, 3, ]
Number of outputs = 1
Output 0 : name=dense_2, type=1, num_dims=2: [1, 13, ]
Loading image
Image dimensions: 224 224 3
First few image values -0.929412 -0.803922 -0.780392
Created Ort:CreateCpu
Created Ort:Tensor
Called into systolic matmul!
Using accelerated matmul with dimensions (12544, 16, 27)
LOOP_WS bounds were too large for double-buffering

I have also tried to run it with the -x 1 option and it gives the following error:

Gemmini extension configured with:
    dim = 16
bbl loader
Loaded runner program
Using systolic in mode 1
Using Onnxruntime C++ API
Number of inputs = 1
Input 0 : name=input, type=1, num_dims=4: [1, 224, 224, 3, ]
Number of outputs = 1
Output 0 : name=dense_2, type=1, num_dims=2: [1, 13, ]
Loading image
Image dimensions: 224 224 3
First few image values -0.929412 -0.803922 -0.780392
Created Ort:CreateCpu
Created Ort:Tensor
Called into systolic matmul!
Using accelerated matmul with dimensions (12544, 16, 27)
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 1024) >= this->size() (which is 1024)

Urgency
High

System information

OS Platform and Distribution: Linux Ubuntu-20.04
Python version: 3.9.7
GCC/Compiler version (if compiling from source): riscv64-unknown-linux-gnu-* from the riscv-tools esp-tools branch
GPU model and memory: Gemmini
Chipyard: 1.6.2 (commit 481398b910fa95ec88dd578c67ba358a4d83129d)
ONNX-runtime-riscv: commit 0c8c9b4 , 2021-12-23
Gemmini: commit c47cb7f3eb5c18390f176f3a53c43c8546d487d2

To Reproduce

I attach the model.
MobileNetV2_0p5_all.zip
And if you need I can attach the test code.

Expected behavior
A correct run, as it runs with -x 0 (CPU).

Thanks in advance!

Can I generate waveforms from model.onnx

Is your feature request related to a problem? Please describe.
I can generate waveforms in chipyard by running the command (for example)
./simulator-chipyard-GemminiRocketConfig-debug mobilenet-baremetal .
So can I have some method to use model.onnx to generate waveforms in gemmini?
I want to analyse performance of gemmini and make comparation to other accerelators.

Cannot Run Model with multiple inputs nodes

Hi,

I am running a model with multiple input nodes. So I tried:

Ort::Value batch_input_tensors[3];
batch_input_tensors[0] = input_ids_tensor;
batch_input_tensors[1] = attention_mask_tensor;
batch_input_tensors[2] = token_type_ids_tensor;

auto output_tensors = session.Run(Ort::RunOptions{nullptr}, input_node_names.data(), batch_input_tensors, 3, output_node_names.data(), 2);

However, I got the following error at compilation time:

src/bert_runner.cpp: In function 'int main(int, char**)':
src/bert_runner.cpp:187:35: error: no matching function for call to 'Ort::Value::Value()'
  187 |   Ort::Value batch_input_tensors[3];
      |                                   ^
In file included from src/bert_runner.cpp:9:
../..//include/onnxruntime/core/session/onnxruntime_cxx_api.h:276:3: note: candidate: 'Ort::Value::Value(Ort::Value&&)'
  276 |   Value(Value&&) = default;
      |   ^~~~~
../..//include/onnxruntime/core/session/onnxruntime_cxx_api.h:276:3: note:   candidate expects 1 argument, 0 provided
../..//include/onnxruntime/core/session/onnxruntime_cxx_api.h:275:12: note: candidate: 'Ort::Value::Value(OrtValue*)'
  275 |   explicit Value(OrtValue* p) : Base<OrtValue>{p} {}
      |            ^~~~~
../..//include/onnxruntime/core/session/onnxruntime_cxx_api.h:275:12: note:   candidate expects 1 argument, 0 provided
../..//include/onnxruntime/core/session/onnxruntime_cxx_api.h:274:12: note: candidate: 'Ort::Value::Value(std::nullptr_t)'
  274 |   explicit Value(std::nullptr_t) {}
      |            ^~~~~
../..//include/onnxruntime/core/session/onnxruntime_cxx_api.h:274:12: note:   candidate expects 1 argument, 0 provided
src/bert_runner.cpp:188:28: error: use of deleted function 'Ort::Value& Ort::Value::operator=(const Ort::Value&)'
  188 |   batch_input_tensors[0] = input_ids_tensor;
      |                            ^~~~~~~~~~~~~~~~
In file included from src/bert_runner.cpp:9:
../..//include/onnxruntime/core/session/onnxruntime_cxx_api.h:256:8: note: 'Ort::Value& Ort::Value::operator=(const Ort::Value&)' is implicitly declared as deleted because 'Ort::Value' declares a move constructor or move assignment operator
  256 | struct Value : Base<OrtValue> {
      |        ^~~~~
src/bert_runner.cpp:189:28: error: use of deleted function 'Ort::Value& Ort::Value::operator=(const Ort::Value&)'
  189 |   batch_input_tensors[1] = attention_mask_tensor;
      |                            ^~~~~~~~~~~~~~~~~~~~~
src/bert_runner.cpp:190:28: error: use of deleted function 'Ort::Value& Ort::Value::operator=(const Ort::Value&)'
  190 |   batch_input_tensors[2] = token_type_ids_tensor;
      |                            ^~~~~~~~~~~~~~~~~~~~~

So, what is the correct way to run a model with multiple input nodes?

Thank you!

Adding unzip library for requirement.txt

Describe the bug
while building this repo, if the host does not have unzip library installed, building this repo fails. Error takes in the code below.
https://github.com/ucb-bar/onnxruntime-riscv/blob/2021-12-23/build.sh#:~:text=cd%20build/protoc-,unzip%20protoc.zip,-fi

Urgency
none

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
ONNX Runtime installed from (source or binary): source
ONNX Runtime version: Up to date
Python version: python 3.6.9
Visual Studio version (if applicable): N/A
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: N/A
GPU model and memory: N/A

To Reproduce

This issue can be resolved by adding the unzip library in the requirement.txt or modifying the build.sh bash file.

Expected behavior

There is no failure when building this repo even the unzip library is uninstalled in the host environment.

Question: Is there a reason that Gemmini with int8 configuration does not support GEMM operation?

As the title said, I am just curious if there a specific reason that int8 gemmini does not support GEMM operation compared to fp32 configuration. If there is no specific reason, I am trying to implement this operation in int8 configuration.

Low Gemmini Utilization on floating point model

Hi,

An issue I am having is that: when I am running an unquantized model with float32 all over the places, by looking at the traces, it seems that the systolic array isn't doing anything at all (almost all the computations including the matrix multiplication is performed by the CPU). Will those computations be performed on the Gemmini if I configure the Gemmini to support floating points as the datatype?

The screenshot for the trace is attached below:

Thanks!

What is the method to get my own onnx trace.json file for profiling?

Describe the bug
I just follow the command lines from Gemmini Tutorial IISWC 2021. I followed the guide provided in this repo and got my own onnx model successfully.

spike --extension=gemmini pk ort_test -m vgg19.onnx -i images/dog.jpg -p caffe2 -x 2 -O 99

However, there is no method to generate trace.json to profile the onnx model. In the Gemmini Tutorial IISWC 2021, it just let us download the trace.json from

https://tinyurl.com/gemmini-resnet50-trace

So, I am wondering how can I get my own trace.json for my customed onnx model?

To get my own trace.json, I have tried my way as follows, but I meet some error. Could you give me an example to get my own trace.json?

Urgency
I have been working on this issue for about a week, so I think it is urgent.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 18.04): Linux Ubuntu 18.04
ONNX Runtime installed from (source or binary): in this repo
ONNX Runtime version: in this repo
Python version: 3.6

To Reproduce
I have tried to get the trace.json as follows:

I find when I finished onnxruntime-riscv by ./build.sh --config=Release --parallel --enable_training in chipyard/generators/gemmini/software/onnxruntime-riscv, I can get
chipyard/generators/gemmini/software/onnxruntime-riscv/build/Release/onnxruntime_perf_test, and the tutorial says that I can get my own trace.json.
To check the function of onnxruntime_perf_test, I run the command lines as follows:

spike --extension=gemmini pk build/Release/onnxruntime_perf_test -h.

And I got the out like the README here::

Gemmini extension configured with:
dim = 4
bbl loader
perf_test [options...] model_path [result_file]
Options:
-m [test_mode]: Specifies the test mode. Value could be 'duration' or 'times'.
Provide 'duration' to run the test for a fix duration, and 'times' to repeated for a certain times.
-M: Disable memory pattern.
-A: Disable memory arena
-I: Generate tensor input binding (Free dimensions are treated as 1.)
-c [parallel runs]: Specifies the (max) number of runs to invoke simultaneously. Default:1.
-e [cpu|cuda|dnnl|tensorrt|openvino|nuphar|dml|acl]: Specifies the provider 'cpu','cuda','dnnl','tensorrt', 'openvino', 'nuphar', 'dml', 'acl', 'nn api' or 'coreml'. Default:'cpu'.
-b [tf|ort]: backend to use. Default:ort
-r [repeated_times]: Specifies the repeated times if running in 'times' test mode.Default:1000.
-t [seconds_to_run]: Specifies the seconds to run for 'duration' mode. Default:600.
-p [profile_file]: Specifies the profile name to enable profiling and dump the profile data to the file.
-s: Show statistics result, like P75, P90. If no result_file provided this defaults to on.
-v: Show verbose information.
-x [intra_op_num_threads]: Sets the number of threads used to parallelize the execution within nodes, A value of 0 means ORT will pick a default. M ust >=0.
-y [inter_op_num_threads]: Sets the number of threads used to parallelize the execution of the graph (across nodes), A value of 0 means ORT will pi ck a default. Must >=0.
-f [free_dimension_override]: Specifies a free dimension by name to override to a specific value for performance optimization. Syntax is [dimension _name:override_value]. override_value must > 0
-F [free_dimension_override]: Specifies a free dimension by denotation to override to a specific value for performance optimization. Syntax is [dim ension_denotation:override_value]. override_value must > 0
-P: Use parallel executor instead of sequential executor.
-o [optimization level]: Default is 1. Valid values are 0 (disable), 1 (basic), 2 (extended), 99 (all).
Please see onnxruntime_c_api.h (enum GraphOptimizationLevel) for the full list of all optimization levels.
-u [optimized_model_path]: Specify the optimized model path for saving.
-d [cudnn_conv_algorithm]: Specify CUDNN convolution algothrithms: 0(benchmark), 1(heuristic), 2(default).
-q: [CUDA only] use separate stream for copy.
-z: Set denormal as zero. When turning on this option reduces latency dramatically, a model may have denormals.
-i: Specify EP specific runtime options as key value pairs. Different runtime options available are:
[OpenVINO only] [device_type]: Overrides the accelerator hardware type and precision with these values at runtime.
[OpenVINO only] [device_id]: Selects a particular hardware device for inference.
[OpenVINO only] [enable_vpu_fast_compile]: Optionally enabled to speeds up the model's compilation on VPU device targets.
[OpenVINO only] [num_of_threads]: Overrides the accelerator hardware type and precision with these values at runtime.
[OpenVINO only] [use_compiled_network]: Can be enabled to directly import pre-compiled blobs if exists. currently this feature is only supporte d on MyriadX(VPU) hardware device target.
[OpenVINO only] [blob_dump_path]: Explicitly specify the path where you would like to dump and load the blobs for the use_compiled_network(save /load blob) feature. This overrides the default path.
[Usage]: -e <provider_name> -i '| |'

     [Example] [For OpenVINO EP] -e openvino -i "device_type|CPU_FP32 enable_vpu_fast_compile|true num_of_threads|5 use_compiled_network|true blob_dump                                                                  _path|"<path>""
        [TensorRT only] [use_trt_options]: Overrides TensorRT environment variables (if any) with following settings at runtime.
        [TensorRT only] [trt_max_workspace_size]: Set TensorRT maximum workspace size in byte.
        [TensorRT only] [trt_fp16_enable]: Enable TensorRT FP16 precision.
        [TensorRT only] [trt_int8_enable]: Enable TensorRT INT8 precision.
        [TensorRT only] [trt_int8_calibration_table_name]: Specify INT8 calibration table name.
        [TensorRT only] [trt_int8_use_native_calibration_table]: Use Native TensorRT calibration table.
        [TensorRT only] [trt_force_sequential_engine_build]: Force TensorRT engines to be built sequentially.
     [Usage]: -e <provider_name> -i '<key1>|<value1> <key2>|<value2>'

     [Example] [For TensorRT EP] -e tensorrt -i 'use_trt_options|true trt_fp16_enable|true trt_int8_enable|true trt_int8_calibration_table_name|calibra                                                                  tion.flatbuffers trt_int8_use_native_calibration_table|false trt_force_sequential_engine_build|false'
    -h: help

So I try to get my own trace.json:

spike --extension=gemmini pk build/Release/onnxruntime_perf_test -m duration -p my_own_trace.json systolic_runner/imagenet_runner/model/alexnet.onnx result_file

and get

Gemmini extension configured with: dim = 4 bbl loader /data/sxchen/AutoSoC/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/platform/posix/env.cc:134 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int (*)(int, Eigen::ThreadPoolInterface*), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_create failed

Expected behavior
Get my own trace.json through onnxruntime_perf_test.

Additional context

The gemmini is on model FP32 (define in chipyard config), and I can run it without error.

Update ort_asserts to use approx floating point equality

See #10 (comment)

bump halide submodule to latest of my fork

title. will do in next merge, noting here so I don't forget

"bad syscall #98!" error in running inference

Describe the bug
I just follow the command lines from Gemmini Tutorial IISWC 2021.
1.It work fine that run ./build.sh --parallel --enable_training --config=Debug --cmake_extra_defines onnxruntime_USE_SYSTOLIC=ON onnxruntime_SYSTOLIC_INT8=ON onnxruntime_SYSTOLIC_FP32=OFF in /gemmini/software/onnxruntime-riscv/

2../build.sh --parallel --enable_training --config=Debug in /gemmini/software/onnxruntime-riscv/systolic_runner/imagenet_runner/:

Building against Debug

/home/gaoyujing/chipyard/esp-tools-install/lib/gcc/riscv64-unknown-linux-gnu/9.2.0/../../../../riscv64-unknown-linux-gnu/bin/ld: ../..//build/Debug/libonnxruntime_common.a(env.cc.o): in function `LoadDynamicLibrary':
/home/gaoyujing/chipyard/generators/gemmini/software/onnxruntime-riscv/onnxruntime/core/platform/posix/env.cc:416: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
Please ignore any dlopen warning above. Glibc hates being statically linked.

i don't konw whether it's collect?

when use spike --extension=gemmini pk ort_test -m resnet50_opt_quant.onnx -i images/dog.jpg -p caffe2 -x 2 -O 99 in /gemmini/software/onnxruntime-riscv/systolic_runner/imagenet_runner/,the error is:

Gemmini extension configured with:
    dim = 16
bbl loader
bad syscall #98!

I don't know why,and when i use -h for help,it's also error:

$spike --extension=gemmini pk ort_test -h

Gemmini extension configured with:
    dim = 16
bbl loader
bad syscall #98!

So i want to konw how to deal with it?

Urgency
I think it's urgent,because our project has been delayed by it.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
ONNX Runtime installed from (source or binary): source
ONNX Runtime version: origin/2021-12-23
Python version: 2.7.18
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: no
GPU model and memory: no

To Reproduce
The ONNX model just from https://tinyurl.com/gemm-iiswc;
lsin/gemmini/software/onnxruntime-riscv/systolic_runner/imagenet_runner/:

Makefile  README.md  batch_infer.sh  build.sh  images  ort_test  resnet50_opt_quant.onnx  src  tools

Expected behavior

Inference Result:

0.031456 giant schnauzer

0.075702 curly-coated retriever

0.087432 Great Dane

0.271946 Labrador retriever

0.361813 Rottweiler

Format error with ONNX quantized CNN

Hello,
I am trying to run a particular version of the MobilenetV2 CNN. I have quantized it and it can be seen in netron.app that it is correctly quantized using the QDQ format.

But when I try to execute the systolic_runner example minimaly modified (just the size of the labels array) on firesim I keep getting the following error:

terminate called after throwing an instance of 'Ort::Exception'
  what():  Unexpected input data type. Actual: (tensor(float)) , expected: (tensor(uint8))

I have tried to run the original Mobilenet_V2 quantized from the onnx repository and it works fine, but I have noticed that it is quantized with the operator oriented formal.

I attach my network.
MobileNetV2_0p5_all_quant.zip

Thank you in advance.
Kind regards, MªJosé.

QAttention: can accelerate q-k dot products

Currently we only accelerate the first phase of the attention score computation, where we multiple the inputs by weights to compute Q/K/V vectors [1]

(onnxruntime/contrib_ops/systolic/quantization/attention_quant.cc)

We also should be able to accelerate the subsequent step where we do dot products of Q & K.

ComputeAttentionProbs in
(onnxruntime/contrib_ops/cpu/bert/attention_cpu_base.h)

This will also allow us to quantize the bias to int32.

However, you will need to be a bit careful: just like we currently modify Microsoft's QAttention schema to add in a q/k/v scale factor (compare against the upstream one which doesn't have it), you will need to change those three to just an attention_probs scale factor (which is the output of Q/K dot product) since attention_probs is what feeds into the softmax when you define your new schema.

Then during the quantization you will have to do something like we do now where you have to manually invoke ORT to generate the scale factors (see how this is currently done in systolic's attention_quant.cc). This is because the calibration script can only look at inputs and outputs of nodes. But we are only doing our computations in int8 up until before the softmax of the attention – since this is all internal to the Attention node, the calibration script is blind to it. Hence we need the code that manually emits the scale factors upon a run.

[1] http://jalammar.github.io/illustrated-transformer/

ucb-bar / onnxruntime-riscv Goto Github PK

onnxruntime-riscv's People

Stargazers

Watchers

Forkers

onnxruntime-riscv's Issues

Recommend Projects

Recommend Topics

Recommend Org