Coder Social home page Coder Social logo

ssssssilver / sherpa-ncnn-unity Goto Github PK

View Code? Open in Web Editor NEW
16.0 3.0 3.0 3.49 MB

在Unity环境下,借助sherpa-ncnn框架,实现实时并准确的中英双语语音识别功能。

License: Apache License 2.0

C# 62.99% Shell 37.01%
bilingual offline-recognition speech-recognition speech-to-text next-gen-kaldi sherpa-ncnn

sherpa-ncnn-unity's Introduction

sherpa-ncnn-unity

写在前面

最近尝试了很多种基于STT的方案,各种方案各有特点但始终没有找到一个能够满足自己需求的方案。最后发现了sherpa-ncnn这个项目,它是一个基于ncnn的新一代kaldi语音识别框架,支持多种语言,支持多种平台,支持多种模型。但是这个项目目前还没有适配Unity的方案,使用起来还是有一定的难度。所以我决定将它移植到Unity环境下,方便自己和其他人使用。 顺便说一下用过的STT方案,以供大家参考:

  • 在线方案:(百度、讯飞、Azure语音识别等)免费额度少,需要实名认证,识别率高,在线识别延迟高 多语种同时识别
  • 离线方案1:whisper asr service 支持本地docker离线部署,提供api请求,多种语言识别,占资源,识别速度一般,中文识别能力一般 多语种同时识别
  • 离线方案2:Undertone 将whisper方案内置到unity中实现,免部署,占空间,多语种同时识别
  • 离线方案3:Speech Recognition System 基于Vosk的语音识别,支持多种语言,支持多种平台,支持多种模型,速度快,但只能单语种识别
  • 离线方案4:sherpa-ncnn 基于ncnn的新一代kaldi语音识别框架,支持多种语言,支持多种平台,支持多种模型,速度快,多语种同时识别,快,速度非常的快,但是没有Unity能直接使用的方案,这也是创建这个 repo 的原因

Feature

在Unity环境下,借助sherpa-ncnn框架,实现实时并准确的中英双语语音识别功能。

支持平台

  • Windows
  • WebGL (测试中)
  • Android (待接入)
  • IOS (待接入)

效果

操作演示

使用说明

1.下载需要使用的大模型文件,放到StreamingAssets文件夹下 2.配置路径

模型架构介绍

https://k2-fsa.github.io/sherpa/ncnn/index.html

模型下载

https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/index.html

感谢

https://github.com/k2-fsa/sherpa-ncnn

题外话

说实话,能找到sherpa-ncnn这个方案纯属巧合。因为有做实时TTS的需求,逛一轮后推荐的都是whisper或者百度讯飞这类的在线服务,但它们的缺点也很明显,不是不够实时就是不够准确,或者又是不支持多语种同时识别。机缘巧合之下玩了一下steam上的数字伙伴 ,发现它的语音识别速度非常快,准确率也很高,不仅能离线而且支持中英语种同时识别。扒了一下它的文件资源半天发现了个类似大模型的文件夹 然后顺藤摸瓜,找到了sherpa-ncnn这个项目。在经过亲自的测试体验之后,我如同醍醐灌顶,对于这一领域有了更加深刻的理解。在认识到自己知识的局限性的同时,感激sherpa-ncnn项目的开发者们,不但做出了这么棒的实时语音工具,还不收分文就分享给了我们这些开发者。真心希望我也能帮上点忙,让更多人知道、用上这个项目。

sherpa-ncnn-unity's People

Contributors

aspdotnet-done avatar ssssssilver avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

sherpa-ncnn-unity's Issues

Demo初始化模型时候闪退

经过测试,发现脚本在执行到第60行,也就是加载config,创建识别器和在线流时候会发生闪退。
``
private SherpaNcnn.OnlineRecognizer recognizer;
private SherpaNcnn.OnlineStream onlineStream;
private int segmentIndex = 0;
private string lastText = "";
[SerializeField]
private Text Text;
[SerializeField]
private Text buttonTxt;

    // 可以在Unity编辑器中设置这些参数
    public string tokensPath;
    public string encoderParamPath;
    public string encoderBinPath;
    public string decoderParamPath;
    public string decoderBinPath;
    public string joinerParamPath;
    public string joinerBinPath;
    public int numThreads = 1;
    public string decodingMethod = "greedy_search";

    void Start()
    {
        // 初始化配置
        SherpaNcnn.OnlineRecognizerConfig config = new SherpaNcnn.OnlineRecognizerConfig
        {
            FeatConfig = { SampleRate = 16000, FeatureDim = 80 },
            ModelConfig = {
            Tokens = Path.Combine(Application.streamingAssetsPath,tokensPath),
            EncoderParam =  Path.Combine(Application.streamingAssetsPath,encoderParamPath),
            EncoderBin =Path.Combine(Application.streamingAssetsPath, encoderBinPath),
            DecoderParam =Path.Combine(Application.streamingAssetsPath, decoderParamPath),
            DecoderBin = Path.Combine(Application.streamingAssetsPath, decoderBinPath),
            JoinerParam = Path.Combine(Application.streamingAssetsPath,joinerParamPath),
            JoinerBin =Path.Combine(Application.streamingAssetsPath,joinerBinPath),
            UseVulkanCompute = 0,
            NumThreads = numThreads
        },
            DecoderConfig = {
            DecodingMethod = decodingMethod,
            NumActivePaths = 4
        },
            EnableEndpoint = 1,
            Rule1MinTrailingSilence = 2.4F,
            Rule2MinTrailingSilence = 1.2F,
            Rule3MinUtteranceLength = 20.0F
        };

        // 创建识别器和在线流
        recognizer = new SherpaNcnn.OnlineRecognizer(config);

``

我查看了Unity的Log,向GPT提问以后他解释的原因是:尝试在Unity应用中通过sherpa-ncnn-core加载或初始化某个深度学习模型时,ncnn库在构建卷积层的硬件加速管道时遇到了内存访问违规。这可能是由于模型定义有误、模型文件损坏、硬件兼容性问题、或者是ncnn库与当前运行环境的某些配置不匹配所导致的。

请问我需要做哪些修改?(刚接触Unity,希望大佬回复)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.