julius-speech / dictation-kit Goto Github PK

View Code? Open in Web Editor NEW

151.0 151.0 49.0 326.45 MB

Japanese dictation kit using Julius

License: Other

Python 82.56% Perl 7.65% Shell 6.08% Batchfile 3.71%

dictation-kit's People

Contributors

Stargazers

Watchers

Forkers

poohsan1349 jongyoonb zhoumn ming-hai chagge bigsnake1989 ryuichiueda palles77 weizy1981 icewwn charlotteliu hmilysls denjiry jsbrique kitter nmtspr sanyaade-speechtools geeksivan ferid60433 laughmao redtreeai zy-sunshine 2017scutfjj zhang2liang mhsystemhatayama dingych yoki legend59 stmen lianjy357 anderjoesun lbj38057473 zhousj5292 ffxdd nakamin7 skitaoka vtienbk mel-peng kju196 trungducnguyen herryliq leidenschaft kirisaki tomotaro1 charygao tigerdehao kinue00 alanyin233

dictation-kit's Issues

failed to read model/dnn/binhmm.SID

I followed the tutorial in the readme.md and run run-osx-gnn.sh but got the following errors:
STAT: include config: main.jconf STAT: include config: am-dnn.jconf STAT: parsing option string: "-htkconf model/dnn/config.lmfb -cvn -cmnload model/dnn/norm -cmnstatic" Stat: para: parsing HTK Config file: model/dnn/config.lmfb Warning: para: "SOURCEFORMAT" ignored (not supported, or irrelevant) Warning: para: "SOURCEKIND" ignored (not supported, or irrelevant) Stat: para: SOURCERATE=625 Warning: para: TARGETKIND skipped (will be determined by AM header) Stat: para: TARGETRATE=100000.0 Warning: para: "SAVECOMPRESSED" ignored (not supported, or irrelevant) Warning: para: "SAVEWITHCRC" ignored (not supported, or irrelevant) Stat: para: WINDOWSIZE=250000.0 Stat: para: USEHAMMING=T Stat: para: PREEMCOEF=0.97 Stat: para: NUMCHANS=40 Stat: para: ENORMALISE=F Warning: para: "BYTEORDER" ignored (not supported, or irrelevant) STAT: jconf successfully finalized STAT: *** loading AM00 _default Stat: init_phmm: Reading in HMM definition Error: init_phmm: failed to read model/dnn/binhmm.SID ERROR: m_fusion: failed to initialize AM ERROR: Error in loading model
Also I got the same error on my windows.
I installed git-lfs but just got a total size of 348MB entity.I'm not sure if this has something to do with the error.

Error: init_phmm: failed to read model

I am on Ubuntu 16.04 and got:

$ bash run-linux-dnn.sh 
STAT: include config: main.jconf
STAT: include config: am-dnn.jconf
STAT: parsing option string: "-htkconf model/dnn/config.lmfb -cvn -cmnload model/dnn/norm -cmnstatic"
Stat: para: parsing HTK Config file: model/dnn/config.lmfb
Warning: para: "SOURCEFORMAT" ignored (not supported, or irrelevant)
Warning: para: "SOURCEKIND" ignored (not supported, or irrelevant)
Stat: para: SOURCERATE=625
Warning: para: TARGETKIND skipped (will be determined by AM header)
Stat: para: TARGETRATE=100000.0
Warning: para: "SAVECOMPRESSED" ignored (not supported, or irrelevant)
Warning: para: "SAVEWITHCRC" ignored (not supported, or irrelevant)
Stat: para: WINDOWSIZE=250000.0
Stat: para: USEHAMMING=T
Stat: para: PREEMCOEF=0.97
Stat: para: NUMCHANS=40
Stat: para: ENORMALISE=F
Warning: para: "BYTEORDER" ignored (not supported, or irrelevant)
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Error: init_phmm: failed to read model/dnn/binhmm.SID
ERROR: m_fusion: failed to initialize AM
ERROR: Error in loading model

Can someone give me a hint or workaround?

How to use a wav file as input?

can you show me how to do speech2text from a 16kHz 16bit wav file to text?

git lfs pull failed

Lately, my CI have been doing a lot of git lfs pulls.
As a result, we can't download any more.
I promise not to use git lfs pull too much after this.
I really apologize for any inconvenience this may cause you...

$ git clone https://github.com/julius-speech/dictation-kit.git
$ git lfs pull 
Git LFS: (0 of 28 files) 0 B / 851.93 MB
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
error: failed to fetch some objects from 'https://github.com/julius-speech/dictation-kit.git/info/lfs'

Use socket to communicate with julius server

Description:

I would like to send raw audio stream by network socket and receive recognition result from network output, so there is how I set the julius engine parameters:

Julius ... -input adinnet -adport 5532 -module

By this, I am able to send an audio stream to julius on port 5532, and receive XML format result from port 10500.

Using Method

def send_audio(audiodata):
    header = struct.pack('i', len(audiodata))
    julius_socket_sender.sendall(header + audiodata)  

def recv_trans(self):
    while True:
        trans = julius_trans_receiver.recv(4096).decode('utf-8')

Problem:

1. Does my setting match to my purpose

Julius is a huge system already, so I am not sure if there is any "correct" way to achieve my purpose.

Purpose: send raw audio stream by network socket and receive recognition result from network output

~~## 2. Couldn't receive result at first attempted~~

~~At the very beginning, I send a chunk of a raw audio stream to 5532, then try to receive a result from 10500:~~

send_audio(audio_stream)
recv_trans()

~~By this, i am expecting to receive full result XML format. but what I received is only part of it~~

<STARTPROC>
.
<INPUT STATUS="LISTEN" TIME="132747392" />
.

~~However, after re-sending the audio stream again, I am able to receive a full result.~~
So, my question is, is it expected behavior?
In fact, that is ecpected behavior as mentioned in the connecttion message, sorry for that.

Need Help

I am trying to write a python3 client library, is it possible to have any exist documents for all communication protocols of julius, such as data structure of vecnet, adinnet and module ?

Receive input audio stream without adintool installation in client side

Hey there. I am trying to setup an API endpoint for Julius system.

How can I enable the client to stream audio into the system without installing adintool?

Julius v4.5 Japanese dictation kit. Error: init_phmm: failed to read binhmm.SID

I have downloaded the Japanese dictation Kit (Julius v4.5), and was trying to run:
./run-osx-dnn.sh

But the is an error:
Error: init_phmm: failed to read binhmm.SID
I am using the default jconf setting file as given, so it should works fine but it doesn’t.
I am sure the binhmm.SID file and the logicalTri.bin file are in the right directory, so the program should locate it fine.

What could cause the problem? And how should I fix it?

Thank you,
Xinlei

Output Charset Problem 文字セット　プロブレム

I was trying dictation-kit with default settings, while the output seemingly to be messy.
I did not know which charset to use, as I had tried chcp 65001 (utf-8), chcp 50220(Japanese) and several other recommended charsets online.
私は先日デフォルトでキットをつかっていましたが、出力が文字化けしていました。
chcp 65001(utf-8)と50220が違っていますが、どの文字セットが正しいますか？

Failed to clone 2 GB file

i already use Git LFS to clone this repo, but i only got 870,6 MB of total size (i've tried it twice and both of them have the same size). is this something wrong with my git lfs or this is actually the size of the file (but you stated it's arround 2 GB)?

thank you

How to use this kit in Java Web client?

I want to use this kit in my Java website to recognition user voice (real time). I haven't done it yet. I have some questions as below:

Can I use it for voice recognition from many web client?
- Get Audio from web client (Always listening).
- Send audio into Julius server (at another computer).
- Get text result at web client.
  I'm trying to use input with adinnet. But nothing works. Maybe, I config wrong something.
  There's any tutorial to do it?
Can I use this kit for Commercial use?
I’m really grateful for your help. Thanks.