Coder Social home page Coder Social logo

k2-fsa / sherpa-onnx Goto Github PK

View Code? Open in Web Editor NEW
818.0 28.0 160.0 3.31 MB

Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift

Home Page: https://k2-fsa.github.io/sherpa/onnx/index.html

License: Apache License 2.0

CMake 6.60% C++ 44.93% Shell 4.90% Python 16.51% C 3.55% Kotlin 6.40% Makefile 0.22% Swift 3.88% Objective-C 0.01% Java 2.17% C# 3.44% HTML 0.50% JavaScript 4.26% Go 2.63%
asr onnx windows linux macos cpp android ios raspberry-pi aarch64

sherpa-onnx's People

Contributors

20246688 avatar bhaswa avatar chiiyeh avatar csukuangfj avatar emreozkose avatar erquren avatar frankyoujian avatar garylaurenceauava avatar hiedean avatar jingzhaoou avatar jinzr avatar kajimacn avatar kamirdin avatar karelvesely84 avatar keanucui avatar leohuang2013 avatar li563042811 avatar longshiming avatar lovemefan avatar mablue avatar manyeyes avatar neuxys avatar nmfisher avatar nshmyrev avatar pingfengluo avatar pkufool avatar vsd-vector avatar w11wo avatar yujinqiu avatar zhaomingwork avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sherpa-onnx's Issues

[Help wanted] Support the latest zipformer (streaming + non-streaming) from icefall

k2-fsa/icefall#1108 adds support to export the latest zipformer to ONNX (both streaming and non-streaming are supported).

We need to support the exported models in sherpa-onnx.

@kakashidan


We also need to update the doc:

hello, the Chinese onnx model inference seems not right.

HI. I have using this script export wenet model:

python export_onnx.py --bpe_model weights/icefall_asr_wenetspeech_pruned_transducer_stateless2/data/lang_char/ --pretrained_model weights/icefall_asr_wenetspeech_pruned_transducer_stateless2/exp/pretrained_epoch_10_avg_2.pt

this weighs using pytorch inference result OK.

but when using it inference on onnx, I get result:

wav: ../../../data/test_data/16k16bit.wav search: greedy
loading ../../../weights/onnx/encoder.onnx
loading ../../../weights/onnx/decoder.onnx
loading ../../../weights/onnx/joiner.onnx
loading ../../../weights/onnx/joiner_encoder_proj.onnx
loading ../../../weights/onnx/joiner_decoder_proj.onnx
../../../data/test_data/16k16bit.wav
reading ../../../data/test_data/16k16bit.wav
wav shape: (1,112400)
Elapsed: 0.332 seconds
rtf: 0.0474286
Hyps:
3938-2261-519-3938-376-657-657-4250-4745-637-1449-1449-59-3978-376-249-5294-2480-1449-657-3786-519-249-519-249-519-249-519-249-519-249-1449-2480-1872-657-519-1713-3201-4186-3938-3938-4745-519-3938-376-519-249-3669-3938-4877-5525-3938-4745-519-3938-376-657-657-519-1713-3938-2261-1449-59-3978-376-249-5294-4745-1449-1449-3655-1449-3710-1449-1449-59-3978-376-249-5294-2480-1449-657-3786-519-249-519-249-519-249-519-249-519-249-519-249-519-249-519-249-519-249-519-249-519-249-519-249-1449-2480-3281-519-249-519-249-519-249-519-249-519-249-519-249-519-249-519-249-519-249-519-249-1449-2480-637-1449-1449-3978-858-1852-1574-249-2368-2480-1923-1449-2169-361-2480-1449-657-3786-519-249-519-249-519-249-519-249-1449-2480-1501-657-3786-3669-3938-4877-657-3938-4745-519-3938-376-|
诋犹估诋亲谅谅惬颔扑殊殊就孵亲务邗竖殊谅椎估务估务估务估务估务殊竖瞭谅估顿尬蜕诋诋颔估诋亲估务殓诋蹰蚡诋颔估诋亲谅谅估顿诋犹殊就孵亲务邗颔殊殊挎殊跺殊殊就孵亲务邗竖殊谅椎估务估务估务估务估务估务估务估务估务估务估务估务殊竖撇估务估务估务估务估务估务估务估务估务估务殊竖扑殊殊孵螃园径务咽竖脖殊液立竖殊谅椎估务估务估务估务殊竖迈谅椎殓诋蹰谅诋颔估诋亲

I checked the tokens seems normal.

What's could be missed?

请教onnx+C++的学习方向

您的项目我已经尝试,是非常方便部署和运行的ASR优秀模型!我是一名研究生,希望学习向您了解,写出这样优质的项目(或者说至少努力接近的程度)的基本知识有哪些? 我已经具有python的深度学习经验。 关于C++、onnx,我还需要学习哪些知识呢?
感谢您的回答!

Librispeech trained model for Streaming ASR

Hello,

Are there any pre-trained models like icefall-asr-multidataset-pruned_transducer_stateless7-2023-05-04 (Englis) which are trained on GigaSpeech + LibriSpeech + Common Voice 13.0 for Streaming ASR

Concurrently, which is the best pre-trained model which you have for Streaming ASR

The recent provider/session refactor is not working for OpenVINO.

I changed GetSessionOptionsImpl() in session.cc to use OpenVINO.

  Ort::SessionOptions sess_opts;
  sess_opts.SetIntraOpNumThreads(num_threads);
  sess_opts.SetInterOpNumThreads(num_threads);
  sess_opts.SetGraphOptimizationLevel(ORT_DISABLE_ALL);
  OrtOpenVINOProviderOptions options;
  options.device_type = "CPU_FP32";
  sess_opts.AppendExecutionProvider_OpenVINO(options);
  return sess_opts;

After that, I ran into errors like

2023-05-16 15:37:00.162490913 [W:onnxruntime:, session_state.cc:1136 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may o
r may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-05-16 15:37:00.162554167 [W:onnxruntime:, session_state.cc:1138 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignmen
ts.
2023-05-16 15:37:00.573335241 [W:onnxruntime:, session_state.cc:1136 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may o
r may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-05-16 15:37:00.573379616 [W:onnxruntime:, session_state.cc:1138 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignmen
ts.
terminate called after throwing an instance of 'Ort::Exception'
  what():  Encountered unknown exception in Run()

I looked at the OnlineZipformerTransducerModel class:

  Ort::SessionOptions sess_opts_;
  Ort::AllocatorWithDefaultOptions allocator_;

While sess_opts_ is initialized using sess_opts_(GetSessionOptions(config)), allocator_ is probably not using the updated session options. That may be the root cause of the above issue.

I appreciate your help on this issue. Thanks a lot!

i have one question

it is nice work,i real like it.
but, when i run this project,i have some questions.

i try this web 1 and 3 method to install this project.
if i install this with method 1
it work nice ,the words can recognize and can split when the talk have some time no voice.
but when i install with method 3, the words can recognize but can not split when the talk have long time no voice.it will output one line

can you tell me how to solve it.

Thank you

Missing warmup process as k2-sherpa

In k2-sherpa, during warm-up, the encoder is initialized to the model initial states.

  void WarmUp(torch::Tensor features, torch::Tensor features_length) {
    torch::IValue states = GetEncoderInitStates();
    states = StackStates({states});

However, I don't see similar warm-up process with sherpa-onnx. This may cause a bit worse performance at the beginning of a stream. It may be tricky to save the model initial states during ONNX export I guess.

Appreciate any suggestions.

Installing ONNX for Windows gives error for cmake -DCMAKE_BUILD_TYPE=Release ..

(base) C:\Users\K2\sherpa-onnx\build>cmake -DCMAKE_BUILD_TYPE=Release ..
-- Building for: NMake Makefiles
CMake Error at CMakeLists.txt:2 (project):
Running

'nmake' '-?'

failed with:

The system cannot find the file specified

CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!
See also "C:/Users/K2/sherpa-onnx/build/CMakeFiles/CMakeOutput.log".

Benchmarking on Android

I've seen that for Icefall, the 2 ways to export models are using either ONNX (this package) or NCNN.

Has there been any benchmarking done for the 2 methods? I'm wondering which one would be faster.

I did find that there's this page k2-fsa/sherpa-ncnn#44 which includes some NCNN run times.

Warnings when running the streaming flow on GPU

The offline flow has no warning. However, when I ran the streaming flow, I got the following warnings:

2023-05-25 00:21:23.008301845 [W:onnxruntime:, session_state.cc:1136 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-05-25 00:21:23.008326867 [W:onnxruntime:, session_state.cc:1138 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

I turned on ONNX verbose output and found these messages:

2023-05-25 00:21:20.004017871 [V:onnxruntime:, session_state.cc:1116 VerifyEachNodeIsAssignedToAnEp] Node placements
2023-05-25 00:21:20.004031728 [V:onnxruntime:, session_state.cc:1122 VerifyEachNodeIsAssignedToAnEp]  Node(s) placed on [CPUExecutionProvider]. Number of nodes: 875
......
2023-05-25 00:21:20.006623532 [V:onnxruntime:, session_state.cc:1122 VerifyEachNodeIsAssignedToAnEp]  Node(s) placed on [CUDAExecutionProvider]. Number of nodes: 2426
......

So, the warnings are correct as some nodes are assigned to CPU while the others are assigned to GPU. ONNX runtime gave the following explanation:

2023-05-25 00:21:19.287699117 [I:onnxruntime:, fallback_cpu_capability.cc:157 GetCpuPreferredNodes] ORT optimization- Force fallback to CPU execution for node: Unsqueeze_5018 because the CPU execution path is deemed faster than overhead involved with execution on other EPs  capable of executing this node

I wonder if we need to do anything to these warnings and appreciate your feedback. Thanks.

Latest master branch seems to fail style check.

I checked out the latest master branch and ran ./scripts/check_style_cpplint.sh 2. I got the following errors:

......
/workspace/sherpa-onnx/sherpa-onnx/csharp-api/offline-api.h:122:  At least two spaces is best between code and comments  [whitespace/comments] [2]
/workspace/sherpa-onnx/sherpa-onnx/csharp-api/offline-api.h:122:  Could not find a newline character at the end of the file.  [whitespace/ending_newline] [5]
Done processing /Users/jou2019/workspace/sherpa-onnx/sherpa-onnx/csharp-api/offline-api.h
Total errors found: 92
[FAILED] /workspace/sherpa-onnx/sherpa-onnx/csharp-api/offline-api.h

Feature Extraction

In Sherpa, features are extracted with Kaldifeat and it is dependent on Torch. How can I calculate fbanks without Kaldifeat? Should I use kaldi scripts? To give fbanks to onnx model, I should obtain inputs as std::vector<float> like

size_t feature_size = 1 * 600 * 80;    // TODO: dynamic shape
std::vector<float> features(feature_size);
std::vector<int64_t> feature_dims = {1, 600, 80};

size_t feature_length_size = 1; 
std::vector<int64_t> features_lengths = {600}; // TODO: dynamic shape
std::vector<int64_t> feature_length_dims = {1};

std::vector<const char*> output_node_names = {"encoder_out"};

// create input tensor object from data values
auto memory_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);

Ort::Value feature_tensor = Ort::Value::CreateTensor<float>(memory_info, features.data(), feature_size, feature_dims.data(), 3);
Ort::Value feature_tensor_length = Ort::Value::CreateTensor<int64_t>(memory_info, features_lengths.data(), feature_length_size, feature_length_dims.data(), 1);

Hi, I can not reproduce the result (except using your onnx)

Hi, I tried export onnx by myself, but the result are wrong (Chinese version):

./build/bin/sherpa-onnx ../../weights/onnx/tokens.txt ../../weights/onnx/encoder.onnx ../../weights/onnx/decoder.onnx ../../weights/onnx/joiner.onnx ../../weights/onnx/joiner_encoder_proj.onnx ../../weights/onnx/joiner_decoder_proj.onnx ../../data/test_data/16k16bit.wav
wav filename: ../../data/test_data/16k16bit.wav
wav duration (s): 7.025
Recognition result for ../../data/test_data/16k16bit.wav
珲珲珲仪韭韭熏呸鞘熏足绽Z韭珲猢仪梆鸥柜懈爷绽懈珲懈爷铍梆韭珲爷殡韭熏呸爷冥寓桐积珲懈爷铍岑霜坷熏鸥严肱苜珲懈爷铍岑霜坷熏鸥严芦斐涿苜珲懈爷铍岑霜坷熏鸥严肱苜珲懈爷绽懈懈恰韭熏呸览珲缪流苜鸳捆执绽圄懈恰韭熏呸鞘螺熏桐积珲懈爷铍岑霜懈熏恰韭熏呸苜熏桐积珲懈爷铍岑霜坷熏武绽韭韭熏呸鞘熏桐积韭珲猢仪梆乒柜苜苜蜕狰诽铉熏岑苜蜕坷桐懈狩殃武绽韭韭熏呸鞘柜坷蜕

this is how I export:

python export_onnx.py --pretrained_model weights/icefall_asr_wenetspeech_pruned_transducer_stateless2/exp/pretrained_epoch_10_avg_2.pt --bpe_model weights/icefall_asr_wenetspeech_pruned_transducer_stateless2/data/lang_char/

what's the bias here caused unable to get right result?

kaldi native io

Hi,

I am trying to use Kaldi Native IO to read wav, but I got this error in knf::OnlineFbank fbank(opts);

start
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

Reproduce

ı wrote a basic main.cpp:

#include <iostream>
#include "kaldi-native-fbank/csrc/online-feature.h"

int main(int argc, char* argv[]) {
    std::cout << "start" << std::endl;

    knf::FbankOptions opts;
    opts.frame_opts.dither = 0;
    opts.frame_opts.frame_shift_ms = 10.0f;
    opts.frame_opts.frame_length_ms = 25.0f;
    opts.mel_opts.num_bins = 80;

    knf::OnlineFbank fbank(opts);
}

It works fine.

Kaldi native io

I cloned and build from source. After that, I added Cmake file (which is taken from Sherpa) to sherpa-onnx/cmake

if(DEFINED ENV{KALDI_NATIVE_IO_INSTALL_PREFIX})
  message(STATUS "Using environment variable KALDI_NATIVE_IO_INSTALL_PREFIX: $ENV{KALDI_NATIVE_IO_INSTALL_PREFIX}")
  set(KALDI_NATIVE_IO_CMAKE_PREFIX_PATH $ENV{KALDI_NATIVE_IO_INSTALL_PREFIX})
else()
  # PYTHON_EXECUTABLE is set by cmake/pybind11.cmake
  message(STATUS "Python executable: ${PYTHON_EXECUTABLE}")

  execute_process(
    COMMAND "${PYTHON_EXECUTABLE}" -c "import kaldi_native_io; print(kaldi_native_io.cmake_prefix_path)"
    OUTPUT_STRIP_TRAILING_WHITESPACE
    OUTPUT_VARIABLE KALDI_NATIVE_IO_CMAKE_PREFIX_PATH
  )
endif()

message(STATUS "KALDI_NATIVE_IO_CMAKE_PREFIX_PATH: ${KALDI_NATIVE_IO_CMAKE_PREFIX_PATH}")
list(APPEND CMAKE_PREFIX_PATH "${KALDI_NATIVE_IO_CMAKE_PREFIX_PATH}")

find_package(kaldi_native_io REQUIRED)

message(STATUS "KALDI_NATIVE_IO_FOUND: ${KALDI_NATIVE_IO_FOUND}")
message(STATUS "KALDI_NATIVE_IO_VERSION: ${KALDI_NATIVE_IO_VERSION}")
message(STATUS "KALDI_NATIVE_IO_INCLUDE_DIRS: ${KALDI_NATIVE_IO_INCLUDE_DIRS}")
message(STATUS "KALDI_NATIVE_IO_CXX_FLAGS: ${KALDI_NATIVE_IO_CXX_FLAGS}")
message(STATUS "KALDI_NATIVE_IO_LIBRARIES: ${KALDI_NATIVE_IO_LIBRARIES}")

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${KALDI_NATIVE_IO_CXX_FLAGS}")
message(STATUS "CMAKE_CXX_FLAGS: ${CMAKE_CXX_FLAGS}")

Then

  1. include(cmake/kaldi_native_io.cmake) is added to sherpa-onnx/CMakeLists.txt.
  2. target_link_libraries(capi_test onnxruntime kaldi-native-fbank-core kaldi_native_io_core) is added to sherpa-onnx/sherpa-onnx/csrc/CMakeLists.txt.

After these steps, I can include files (#include "kaldi_native_io/csrc/kaldi-io.h", #include "kaldi_native_io/csrc/wave-reader.h"). I can also read a wav file. However, knf::OnlineFbank fbank(opts); line gives 'std::bad_alloc'.

How can I solve this? I think the issue is in cmake file, but I cannot find the reason.

Can't build for iOS

When attempting to build four ios using Xcode 14.2 I receive the following error:

fatal error: 'coreml_provider_factory.h' file not found

sherpa_onnx.Display have some bug in windows11

when i run python example in windows x64 with vscode

the terminal output is not nornal work

i think sherpa_onnx.Display have some bug

i add the output with print

'
print("print:: "+ last_result)
display.print(-1, result)
'
it is normal work

the python code is download from

https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/speech-recognition-from-microphone.py

the package is from pip install sherpa-onnx

print:: 你

print:: 你好
浣犲ソ
print:: 你好啊
浣犲ソ鍟
print:: 你好啊今天天
浣犲ソ鍟婁粖澶╁ぉ
print:: 你好啊今天天气
浣犲ソ鍟婁粖澶╁ぉ姘
print:: 你好啊今天天气真不错
浣犲ソ鍟婁粖澶╁ぉ姘旂湡涓嶉敊
print:: 你好啊今天天气真不错 HELLO
浣犲ソ鍟婁粖澶╁ぉ姘旂湡涓嶉敊 HELLO
print:: 你好啊今天天气真不错 HELLO WHATS YOUR
浣犲ソ鍟婁粖澶╁ぉ姘旂湡涓嶉敊 HELLO WHATS YOUR
print:: 你好啊今天天气真不错 HELLO WHATS YOUR NAME
浣犲ソ鍟婁粖澶╁ぉ姘旂湡涓嶉敊 HELLO WHATS YOUR NAME

[Help wanted] Support CoreML for iOS

#151 adds CoreML support for macOS. We also need to support iOS.

The pre-built libs can be downloaded from
https://onnxruntimepackages.z14.web.core.windows.net/pod-archive-onnxruntime-c-1.14.0.zip

After unzipping, you will find the following files:

.
├── Headers
│   ├── coreml_provider_factory.h
│   ├── cpu_provider_factory.h
│   ├── onnxruntime_c_api.h
│   ├── onnxruntime_cxx_api.h
│   └── onnxruntime_cxx_inline.h
├── LICENSE
├── a.txt
└── onnxruntime.xcframework
    ├── Info.plist
    ├── ios-arm64
    │   └── onnxruntime.framework
    │       ├── Headers
    │       │   ├── coreml_provider_factory.h
    │       │   ├── cpu_provider_factory.h
    │       │   ├── onnxruntime_c_api.h
    │       │   ├── onnxruntime_cxx_api.h
    │       │   └── onnxruntime_cxx_inline.h
    │       ├── Info.plist
    │       └── onnxruntime
    └── ios-arm64_x86_64-simulator
        └── onnxruntime.framework
            ├── Headers
            │   ├── coreml_provider_factory.h
            │   ├── cpu_provider_factory.h
            │   ├── onnxruntime_c_api.h
            │   ├── onnxruntime_cxx_api.h
            │   └── onnxruntime_cxx_inline.h
            ├── Info.plist
            └── onnxruntime

8 directories, 22 files

TODOs

Support conformer models

Look like that sherpa-onnx only supports LSTM and ZipFormer models. I wonder if anyone has code to support conformer models. Thanks!

Put pre-compiled binaries into the wheel

When people use pip install sherpa-onnx to install sherpa-onnx from pre-compiled wheels, we should also
install all binaries from sherpa-onnx, such as the following binaries in the screenshot.
Screenshot 2023-03-29 at 22 43 09

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.