hcmlab / vadnet Goto Github PK

Real-time Voice Activity Detection in Noisy Eniviroments using Deep Neural Networks

License: GNU Lesser General Public License v3.0

Batchfile 3.55% Python 96.45%

vadnet's Introduction

VadNet

VadNet is a real-time voice activity detector for noisy enviroments. It implements an end-to-end learning approach based on Deep Neural Networks. In the extended version, gender and laughter detection are added. To see a demonstration click on the images below.

Platform

Windows

Dependencies

Visual Studio 2015 Redistributable (https://www.microsoft.com/en-us/download/details.aspx?id=52685)

Installation

do_bin.cmd - Installs embedded Python and downloads SSI interpreter. During the installation the script tries to detect if a GPU is available and possibly installs the GPU version of tensorflow. This requires that a NVIDIA graphic card is detected and CUDA has been installed. Nevertheless, VadNet does fine on a CPU.

Quick Guide

do_vad[ex].cmd - Demo on pre-recorded files (requires 48k mono wav files)

do_vad[ex]_live.cmd - Capture from microphone and stream results to a socket

do_vad[ex]_loopback.cmd - Instead of a microphone capture from soundcard (loopback mode, see comments below)

do_vad_extract.cmd - Separates audio file into noisy and voiced parts (supports any audio format)

train\do_all.cmd - Performs a fresh training (downloads media files, creates annotations and trains a new network)

Documentation

VadNet is implemented using the Social Signal Interpretation (SSI) framework. The processing pipeline is defined in vad[ex].pipeline and can be configured by editing vad[ex].pipeline-config. Available options are:

audio:live = false                   # $(bool) use live input (otherwise read from file)
audio:live:mic = true                # $(bool) if live input is selected use microphone (otherwise use soundcard)
model:path=models\vad                # path to model folder
send:do = false                      # $(bool) stream detection results to a socket
send:url = upd://localhost:1234      # socket address in format <protocol://host:port>
record:do = false                    # $(bool) capture screen and audio
record:path = capture                # capture path

If the option send:do is turned on, an XML string with the detection results is streamed to a socket (see send:url). You can change the format of the XML string by editing vad.xml. To run SSI in the background, click on the tray icon and select 'Hide windows'. For more information about SSI pipelines please consult the documentation of SSI.

The Python script vad_extract.py can be used to separate noisy and voiced parts of an audio file. For each input file <name>.<ext> two new files <name>.speech.wav and <name>.noise.wav will be generated. The script should handle all common audio formats. You can run the script from the command line by calling > bin\python.exe vad_extract.py <arguments>:

usage: vad_extract.py [-h] [--model MODEL] [--files FILES [FILES ...]] [--n_batch N_BATCH]

optional arguments:
  -h, --help            		show this help message and exit
  --model MODEL         		path to model
  --files FILES [FILES ...]		list of files
  --n_batch N_BATCH     		number of batches

Loopback Mode

In loopback mode, whatever you playback through your soundcard will be analysed. Before using it please set the right output format for your soundcard. To do so, go to the Sound settings in the control panel, select your default playback device and click on Properties. Most devices will now allow you to set a default format. Choose 16 bit, 48000 Hz and press OK.

Insights

The model we are using has been trained with Tensorflow. It takes as input the raw audio input and feeds it into a 3-layer Convolutional Network. The result of this filter operation is then processed by a 2-layer Recurrent Network containing 64 RLU cells. The final bit is a fully-connected layer, which applies a softmax and maps the input to a tuple <noise, voice> in the range [0..1].

Network architecture:

We have trained the network on roughly 134 h of audio data (5.6 days) and run training for 25 epochs (381024 steps) using a batch size of 128.

Filter weights learned in the first CNN layer:

Some selected activations of the last RNN layer for an audio sample containing music and speech:

Activations for all cells in the last RNN layer for the same sample:

Credits

SSI -- Social Signal Interpretation Framework - http://openssi.net
Tensorflow -- An open source machine learning framework for everyone - https://www.tensorflow.org/

License

VadNet is released under LGPL (see LICENSE).

Publication

If you use VadNet in your work please cite the following paper:

@InProceedings{Wagner2018,
  author =    {Johannes Wagner and Dominik Schiller and Andreas Seiderer and Elisabeth Andr\'e},
  title =     {Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?},
  booktitle = {Proceedings of Interspeech},
  address =   {Hyderabad, India},
  year =      {2018},
  pages =     {147--151}
}

Author

Johannes Wagner, Lab for Human Centered Multimedia, 2018

vadnet's People

Contributors

Stargazers

Watchers

Forkers

normonisping xinkez templeblock fdeng1983 audiobucket neil3212080 rangermix pranoot csgcmai jasonaidm sanqiaiziji zacbrannelly maxxiey lagidigu wuwei3017 xushoucai sharonjl heejae1213 cyzhang9999 stevenyesz huong-nt dddddttttt lstappen human2b vbraguimcanto leobewin chenny0808 wangmengzhi dcaulley dmzubr aking777 highdreamer24 nlml xrick wuqiangch thisishoon tts-nlp luvwinnie simpleishappy kingstorm agamjain2012 vaibhavminde shawnbzhang road2018 michaeljayw ardhitama incomingflyingbrick gavinljj subconsciouscompute shini-tm sadam1195 keyansaravanan zhaoforever adamchau rxhmdia stevenrick xiaoyeye1117 wang-asher smilein2020 gxu82 gmmo526 me-jeny-me wy192 fragrantrookie dark-20001 wq1989 jihyemooon baekms bycreati miltonsarria tlwzzy alexsaen murloctw

vadnet's Issues

List index out of range

Help me

I try run python code/annotation.py but:

Traceback (most recent call last):
File "code/annotation.py", line 84, in
sub_path = glob.glob(r'{}..'.format(path))[0]
IndexError: list index out of range

What happened?

how to process and annotate the own dataset to train on pretrained model

I have a huge dataset i want annotate the my own dataset is there any tool like if have a audio for 3 seconds where 1 second is noise and next 1 second is voice and next 1 second is noise.

Can't found model:transform

Hi, i Can't found model:transform

vad_extract filter out speech

hey sir ,when i use vad_extract with chinese one word,it can not discriminate speech and noise

I have some problems with this project

Hello ,author!
First of all, thank you very much for providing me with the ideas I realized.Then I have some questions:

I have noticed that the neural network makes a classification decision each 1 second of audio,but It is possible to include speech and noise in one second, such as 30% noise and 70 voice, how to distinguish them？
If a voice lasts for 1.2 seconds, the next 0.2 seconds of vocals may be classified as noise, resulting in incomplete speech segments, so how to solve this problem?
I want to reduce the classification time, such as 500ms or 250ms, then whether to separate the training speech and noise into a file size of 500ms or 250ms, and then retrain a new model, so will it lead to a decline in the recognition rate?

I am looking forward to your answer, thank you again.

Use as Music Detector

Hello,

first of all, congratulations for this project. The results I got using this library really surprised me. Thanks!

I was wondering if it would easy to detect where music segments appears in the "noise file" because I realized that radio or TV tunes are always classificated as noise (as it should be) but I am also interested in getting these parts.

Any suggestion would be appreciated! Thnx

how to train it with my data

Thanks for the project.
I have found the official source train data comes from http://verteiler5.mediathekview.de/Filmliste-akt.xz in train/code/playlist.py download_list function.
I have downloaded the Filmliste-akt.xz , but I can't figure out what it is. So, can you give me some details about your data?
And , how could I train it with my data (not from youtube) ? In my thought, I should prepare two kind of audio (voice and noise) , is that enough?

the model.py could not be loaded

Dear sir
according to the attached image, the model.py could not be loaded. so when the vadexlive is started, the output is not normal due to lack of loading

the model.
g

Make a decision each X second or ms

Hello again,

After some days off I have returned to this project and I have some new questions.
I have noticed that the neural network makes a classification decision each 1 second of audio,
so I was wondering if I could decrease this interval to a smaller value. For example, classifying each 250ms of audio into Noise/Voice and have a better "precision" when discriminating these classes.

The script I am using is vad_extract.py. I know the input layer receives the samples of each second, make the prediction and then store the probabilities in the labels variable. So, as I'm thinking, the approach will be change the size of the input and final layers to receive samples of, for example, 250ms, make a prediction and stored the probabilies for each 250ms unit in labels. Am I going well? As you did it each second, do you think that discriminate noise/voice in smaller audio units is a good idea?

Btw, I don't know if 'issues' is the best apart for these kind of questions because actually it's not a problem (in fact, your project was really friendly to install and use), I am sorry if you consider it's not.

Regards,
Ana

Is it possible to use this in platform Linux?

Hello, i would like to know if it is possible to use this application in platform Linux by simply changing the cmd files?
Another question is that "Does the CPU works for the Online VAD?"
Thanks for your time,

Comparison against other approaches?

Hi, I have just been trying this lib out and it seems to work very well, much better than WebRTC or other available frameworks that I have tried, and is extremely fast (0.05xRT on average) but my tests have still been largely adhoc. I went through your Interspeech paper:

Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?

but I could not find any direct comparison against other methods. Do you perhaps have any other publications or results that address this topic?

确实运行不了啊，No module named 'encodings'

About frame size

Thanks for this project @frankenjoe
I have noticed that the frame size set to 1 sec.
So in this file

[0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1
 1 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 0 0 1 0 0
 1 1 1 1 0 0 0 0 0]

I have 194 labels noise/speech since the file is 194 seconds.
Is it possible to set an arbitrary size to the frame size, specifically under the reference frame of 1sec? Like 0.25 sec?

Thank you.

No module named 'encodings'

hello after runing the do_cmd I got the following error:
Current thread 0x000119dc (most recent call first):
Fatal Python error: initfsencoding: unable to load the file system codec
ModuleNotFoundError: No module named 'encodings'
any suggestion?

http://verteiler5.mediathekview.de/Filmliste-akt.xz is broken

Download.py does not work since the url is broken.Can you please fix it?

Can't find playlist module

Hello,
I try to us the download.py file oncode folder. I can't find playlist module on my computer and when I try to install it I get that message:

Collecting playlist
  Could not find a version that satisfies the requirement playlist (from versions: )
No matching distribution found for playlist

Minor doubts

wanted to check if the URL mentioned in the repo for the training dataset has moved from https://verteiler5.mediathekview.de/Filmliste-akt.xz to https://verteiler1.mediathekview.de/Filmliste-akt.xz
I'm training with standard config on this dataset(verteiler1) and my accuracy after 5 epochs is still going between 0.6 to 0.7. is this the typical accuracy we should expect on the dataset?

any help/recommendation/suggestion is welcome

建议将所需要下载的东西直接打包好，毕竟网络连接不好，下不下来所需要的库

lmza problem

python3.7 code/download.py
download http://verteiler5.mediathekview.de/Filmliste-akt.xz
Traceback (most recent call last):
File "code/download.py", line 147, in
download_zdf_serien()
File "code/download.py", line 132, in download_zdf_serien
download_list()
File "/home/giuser/vadnet/train/code/playlist.py", line 28, in download_list
data = lzma.decompress(data)
File "/usr/local/lib/python3.7/lzma.py", line 334, in decompress
res = decomp.decompress(data)
_lzma.LZMAError: Input format not supported by decoder

can i use it in tenserflow？

A issue about vad_extract.py

Hello, thank you for your outstanding work.
I encountered the following issue when using the vad_extract.py provided in your code:

Traceback (most recent call last):
  File "/private/CPJ/vadnet-master/vad_extract.py", line 141, in <module>
    extract_voice(args.model, args.files, n_batch=args.n_batch)
  File "/private/CPJ/vadnet-master/vad_extract.py", line 98, in extract_voice
    sess.run(init, feed_dict = { x : input, y : labels, ph_n_shuffle : 1, ph_n_repeat : 1, ph_n_batch : n_batch})
  File "/private/CPJ/anaconda3/envs/vad/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 969, in run
    run_metadata_ptr)
  File "/private/CPJ/anaconda3/envs/vad/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1192, in _run
    feed_dict_tensor, options, run_metadata)
  File "/private/CPJ/anaconda3/envs/vad/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1372, in _do_run
    run_metadata)
  File "/private/CPJ/anaconda3/envs/vad/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1397, in _do_call
    raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

Detected at node 'ds/initializer' defined at (most recent call last):
Node: 'ds/initializer'
AddDataset: Failed to build BatchDataset op with error More Input() calls than the 2 input_args while building NodeDef 'BatchDataset/_14' using Op<name=BatchDataset; signature=input_dataset:variant, batch_size:int64 -> handle:variant; attr=output_types:list(type),min=1; attr=output_shapes:list(shape),min=1; attr=metadata:string,default="">
	 [[{{node ds/initializer}}]]

Original stack trace for 'ds/initializer':

Here it prompts that two extra parameters are given, but I executed it according to the way you provided in the readme. What could be the reason for this? Looking forward to your reply.

Behaviour when classificating voice+noise audio.

Hello,

what is exactly the behaviour of the algorithm when it is trying to classificate a chunk with voice over noise? Does it depend on the "volume" of the noise? Or it depends on the class of samples around the chunk? How can I know that the chunk is not purely voice or noise but a mix of both?

Regards!
Ana

no module termcolor // #dimension (2) of extra streams does not fit # dimension (5) set in options

Hello,
I stuck on the following error. The first error rise, during the installation executing on 'do_bin.cmd' I got an 'error installing module termcolor'. After that the do_vad.cmd open but not works. I attached the images for do_vad_live.cmd.

Any plans to release the train file?

some problems with this project

Excuse me , why my program looks like this. There is no voice bar in the VAD

# WARNING # not found 'ssipython.dll'

I ran any do_vad[ex]_*.cmd files and got this error though I had 'ssipython.dll' in /bin folder.

# WARNING # not found 'ssipython.dll'

speech audio only created for 2 second

Hi , Thanks for the great project .when I feed more then 1 second audio it gives speech audio only 2 second it neglect the rest of the audio.
@frankenjoe
Thanks

Typo in the repository name

Hello!
Eniviroments should be Environments
Great work btw!

black screen on VADEX window

can get it running but the screen goes black:

this is the log of the prompt.


Built with Social Signal Interpretation (SSI)

(c) 2007-19 University of Augsburg, Lab for Human Centered Multimedia
Johannes Wagner, Tobias Baur, Florian Lingenfelser, Andreas Seiderer, Simon Flutura, Dominik Schiller, Ionut Damian

website: http://openssi.net
contact: [email protected]

build version: v1.0.4

download source=https://github.com/hcmlab/ssi/raw/master/bin/x64/vc140
download target=C:\Users\shini\Documents\GitHub\vadnet\bin\

[factory___] found 'Console'
[factory___] create instance of 'Console'
[factory___] store instance of 'Console' as 'console'
[factory___] load 'ssiframe.dll'
[factory___] found 'TheFramework'
[factory___] found 'XMLPipeline'
[factory___] found 'Asynchronous'
[factory___] found 'EventConsumer'
[factory___] found 'Clone'
[factory___] found 'Chain'
[factory___] found 'Cast'
[factory___] found 'Selector'
[factory___] found 'Merge'
[factory___] found 'Inverter'
[factory___] found 'Decorator'
[factory___] load 'ssievent.dll'
[factory___] found 'TheEventBoard'
[factory___] found 'EventMonitor'
[factory___] found 'MapEventSender'
[factory___] found 'TupleEventSender'
[factory___] found 'StringEventSender'
[factory___] found 'ZeroEventSender'
[factory___] found 'ThresEventSender'
[factory___] found 'ThresTupleEventSender'
[factory___] found 'TriggerEventSender'
[factory___] found 'FixationEventSender'
[factory___] found 'ThresClassEventSender'
[factory___] found 'XMLEventSender'
[factory___] found 'ClockEventSender'
[factory___] found 'EventToStream'
[factory___] create instance of 'XMLPipeline'
[factory___] create instance of 'TheFramework'
[factory___] create instance of 'TheEventBoard'
[factory___] store instance of 'XMLPipeline' as 'xmlpipe'
[framexml__] load 'C:\Users\shini\Documents\GitHub\vadnet\vadex.pipeline' (local config=yes, global config=yes)
[framexml__] apply config from 'vadex.pipeline-config'
             live -> false
             live:mic -> true
             path -> data\group.wav
             frame -> 8000
             delta -> 40000
             send:do -> false
             send:url -> upd://localhost:1234
             record:do -> false
             record:path -> capture
[framexml__] apply config from 'C:\Users\shini\Documents\GitHub\vadnet\vadex.pipeline-config'
             live -> false
             live:mic -> true
             path -> data\group.wav
             frame -> 8000
             delta -> 40000
             send:do -> false
             send:url -> upd://localhost:1234
             record:do -> false
             record:path -> capture
[factory___] load 'ssigraphic.dll'
[factory___] found 'VideoPainter'
[factory___] found 'SignalPainter'
[factory___] found 'EventPainter'
[factory___] found 'PointsPainter'
[factory___] load 'ssiaudio.dll'
[factory___] found 'Audio'
[factory___] found 'AudioPlayer'
[factory___] found 'STKAudioMixer'
[factory___] found 'AudioLoopBack'
[factory___] found 'AudioActivity'
[factory___] found 'VoiceActivitySender'
[factory___] found 'VoiceActivityVerifier'
[factory___] found 'AudioIntensity'
[factory___] found 'AudioLpc'
[factory___] found 'AudioConvert'
[factory___] found 'SNRatio'
[factory___] found 'WavReader'
[factory___] found 'WavWriter'
[factory___] found 'WavProvider'
[factory___] found 'PreEmphasis'
[factory___] found 'AudioMono'
[factory___] found 'AudioNoiseGate'
[factory___] load 'ssiioput.dll'
[factory___] found 'MemoryWriter'
[factory___] found 'FileReader'
[factory___] found 'FileWriter'
[factory___] found 'FileEventWriter'
[factory___] found 'SocketReader'
[factory___] found 'SocketWriter'
[factory___] found 'SocketEventWriter'
[factory___] found 'SocketEventReader'
[factory___] found 'FileSampleWriter'
[factory___] found 'FileAnnotationWriter'
[factory___] found 'FakeSignal'
[factory___] found 'NotifyReceiver'
[factory___] found 'NotifySender'
[factory___] load 'ssisignal.dll'
[factory___] found 'MFCC'
[factory___] found 'Energy'
[factory___] found 'Intensity'
[factory___] found 'Functionals'
[factory___] found 'FunctionalsEventSender'
[factory___] found 'Derivative'
[factory___] found 'Integral'
[factory___] found 'Butfilt'
[factory___] found 'IIR'
[factory___] found 'Spectrogram'
[factory___] found 'DownSample'
[factory___] found 'Normalize'
[factory___] found 'MvgAvgVar'
[factory___] found 'MvgMinMax'
[factory___] found 'MvgNorm'
[factory___] found 'MvgPeakGate'
[factory___] found 'MvgDrvtv'
[factory___] found 'MvgConDiv'
[factory___] found 'MvgMedian'
[factory___] found 'Pulse'
[factory___] found 'Multiply'
[factory___] found 'Noise'
[factory___] found 'FFTfeat'
[factory___] found 'ConvPower'
[factory___] found 'Expression'
[factory___] found 'Limits'
[factory___] found 'Gate'
[factory___] found 'Bundle'
[factory___] found 'Statistics'
[factory___] found 'Sum'
[factory___] found 'Relative'
[factory___] found 'Mean'
[factory___] load 'ssipython36.dll'
[pymanager_] init
[factory___] found 'PythonTransformer'
[factory___] found 'PythonFeature'
[factory___] found 'PythonFilter'
[factory___] found 'PythonConsumer'
[factory___] found 'PythonObject'
[factory___] found 'PythonSensor'
[factory___] found 'PythonImageFilter'
[factory___] found 'PythonImageFeature'
[factory___] found 'PythonImageConsumer'
[factory___] found 'PythonModel'
[factory___] load 'ssicontrol.dll'
[factory___] found 'ControlSlider'
[factory___] found 'Controller'
[factory___] found 'ControlCheckBox'
[factory___] found 'ControlTextBox'
[factory___] found 'ControlButton'
[factory___] found 'ControlGrid'
[factory___] found 'ControlEvent'
[factory___] found 'WaitButton'
[factory___] create instance of 'Audio'
[factory___] store instance of 'Audio' as 'noname003'
[provider__] init 'Provider:audio'
             id         = 0
             rate[hz]   = 48000.00
             dim        = 1
             bytes      = 4
             type       = FLOAT
             buffer[s]  = 10.00
             watch      = 1.0s
             sync       = 5.0s
[factory___] create instance of 'PythonFeature'
[factory___] store instance of 'PythonFeature' as 'noname004'
[pyhelper__] new sys path '['C:\\Users\\shini\\Documents\\GitHub\\vadnet\\bin\\python36', 'C:\\Users\\shini\\Documents\\GitHub\\vadnet\\bin', 'C:\\Users\\shini\\Documents\\GitHub\\vadnet\\bin\\lib\\site-packages', 'C:\\Users\\shini\\Documents\\GitHub\\vadnet']'
[pyhelper__] loading script 'model.py'
C:\Users\shini\Documents\GitHub\vadnet\bin\lib\site-packages\tensorflow\python\framework\dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\shini\Documents\GitHub\vadnet\bin\lib\site-packages\tensorflow\python\framework\dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\shini\Documents\GitHub\vadnet\bin\lib\site-packages\tensorflow\python\framework\dtypes.py:521: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\shini\Documents\GitHub\vadnet\bin\lib\site-packages\tensorflow\python\framework\dtypes.py:522: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\shini\Documents\GitHub\vadnet\bin\lib\site-packages\tensorflow\python\framework\dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\shini\Documents\GitHub\vadnet\bin\lib\site-packages\tensorflow\python\framework\dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[pyhelper__] found function 'getOptions'
[pyhelper__] found function 'getSampleDimensionOut'
[pyhelper__] found function 'getSampleTypeOut'
[pyhelper__] found function 'transform_enter'
[pyhelper__] found function 'transform'
[pyhelper__] found function 'transform_flush'
[eboard____] 'noname004' sends '(null)'
load model  models\vad
loading model models\vad\model.ckpt-200106
2021-03-11 00:27:00.315819: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2021-03-11 00:27:00.476991: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:01:00.0
totalMemory: 24.00GiB freeMemory: 19.97GiB
2021-03-11 00:27:00.486038: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2021-03-11 00:29:43.391704: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-11 00:29:43.397844: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929]      0
2021-03-11 00:29:43.402047: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0:   N
2021-03-11 00:29:43.405656: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19374 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
[factory___] create instance of 'PythonFeature'
[factory___] store instance of 'PythonFeature' as 'noname005'
[pyhelper__] loading script 'model.py'
[pyhelper__] found function 'getOptions'
[pyhelper__] found function 'getSampleDimensionOut'
[pyhelper__] found function 'getSampleTypeOut'
[pyhelper__] found function 'transform_enter'
[pyhelper__] found function 'transform'
[pyhelper__] found function 'transform_flush'
[eboard____] 'noname005' sends '(null)'
load model  models\gender
loading model models\gender\model.ckpt-675023
2021-03-11 00:30:41.069534: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2021-03-11 00:30:41.073864: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-11 00:30:41.080212: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929]      0
2021-03-11 00:30:41.083646: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0:   N
2021-03-11 00:30:41.090013: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19374 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
[factory___] create instance of 'PythonFeature'
[factory___] store instance of 'PythonFeature' as 'noname006'
[pyhelper__] loading script 'model.py'
[pyhelper__] found function 'getOptions'
[pyhelper__] found function 'getSampleDimensionOut'
[pyhelper__] found function 'getSampleTypeOut'
[pyhelper__] found function 'transform_enter'
[pyhelper__] found function 'transform'
[pyhelper__] found function 'transform_flush'
[eboard____] 'noname006' sends '(null)'
load model  models\laughter
loading model models\laughter\model.ckpt-830054
2021-03-11 00:30:41.771476: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2021-03-11 00:30:41.774499: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-11 00:30:41.779262: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929]      0
2021-03-11 00:30:41.782795: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0:   N
2021-03-11 00:30:41.785663: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19374 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
[factory___] create instance of 'Merge'
[factory___] store instance of 'Merge' as 'noname007'
[factory___] create instance of 'MvgAvgVar'
[factory___] store instance of 'MvgAvgVar' as 'noname008'
[factory___] create instance of 'PythonFilter'
[factory___] store instance of 'PythonFilter' as 'noname009'
[pyhelper__] loading script 'vadex.py'
[pyhelper__] found function 'getOptions'
[pyhelper__] found function 'getSampleDimensionOut'
[pyhelper__] found function 'getSampleTypeOut'
[pyhelper__] found function 'transform_enter'
[pyhelper__] found function 'transform'
[pyhelper__] found function 'transform_flush'
[eboard____] 'noname009' sends '(null)'
[factory___] create instance of 'SignalPainter'
[factory___] store instance of 'SignalPainter' as 'plot'
[factory___] create instance of 'SignalPainter'
[factory___] store instance of 'SignalPainter' as 'plot1'
[factory___] create instance of 'XMLEventSender'
[factory___] store instance of 'XMLEventSender' as 'monitor'
[eboard____] 'monitor' sends 'final@xml'
[factory___] create instance of 'SocketEventWriter'
[factory___] store instance of 'SocketEventWriter' as 'noname013'
[eboard____] 'noname013' receives 'final@xml'
[factory___] create instance of 'Decorator'
[factory___] store instance of 'Decorator' as 'noname014'
[eboard____] start event board worker
[socketudp_] connect to '127.0.0.1:1234' [udp]
[esockwrite] start sending events to 'udp://127.0.0.1:1234'
[thread____] start 'EventBoardWorker'
[pipeline__] start 0 threads
[thread____] start 'audio'
[provider__] start 'Provider:audio'
[sensor____] connect 'Audio'
[sensor____] start 'Audio'
[thread____] start 'Audio@Microphone (NVIDIA Broadcast)'
[thread____] start 'PythonFeature'
[transform_] start 'PythonFeature:noname004'
             frame[s]   = 0.17
             delta[s]   = 0.83
             id         = 0 -> 1
             rate[hz]   = 48000.00 -> 6.00
             dim        = 1 -> 2
             bytes      = 4 -> 4
             type       = FLOAT -> FLOAT
             buffer[s]  = 10.00
[thread____] start 'PythonFeature'
[transform_] start 'PythonFeature:noname005'
             frame[s]   = 0.17
             delta[s]   = 0.83
             id         = 0 -> 2
             rate[hz]   = 48000.00 -> 6.00
             dim        = 1 -> 3
             bytes      = 4 -> 4
             type       = FLOAT -> FLOAT
             buffer[s]  = 10.00
[thread____] start 'PythonFeature'
[transform_] start 'PythonFeature:noname006'
             frame[s]   = 0.17
             delta[s]   = 0.83
             id         = 0 -> 3
             rate[hz]   = 48000.00 -> 6.00
             dim        = 1 -> 2
             bytes      = 4 -> 4
             type       = FLOAT -> FLOAT
             buffer[s]  = 10.00
[thread____] start 'Merge'
[transform_] start 'Merge:noname007'
             frame[s]   = 0.17
             delta[s]   = 0.00
             id         = 1 -> 4
             rate[hz]   = 6.00 -> 6.00
             dim        = 2 -> 7
             bytes      = 4 -> 4
             type       = FLOAT -> FLOAT
             buffer[s]  = 10.00
[thread____] start 'MvgAvgVar'
[transform_] start 'MvgAvgVar:noname008'
             frame[s]   = 0.17
             delta[s]   = 0.00
             id         = 4 -> 5
             rate[hz]   = 6.00 -> 6.00
             dim        = 7 -> 7
             bytes      = 4 -> 4
             type       = FLOAT -> FLOAT
             buffer[s]  = 10.00
[thread____] start 'PythonFilter'
[transform_] start 'PythonFilter:noname009'
             frame[s]   = 0.17
             delta[s]   = 0.00
             id         = 5 -> 6
             rate[hz]   = 6.00 -> 6.00
             dim        = 7 -> 5
             bytes      = 4 -> 4
             type       = FLOAT -> FLOAT
             buffer[s]  = 10.00
[thread____] start 'SignalPainter'
[thread____] start 'AUDIO' (single run)
[consume___] start 'SignalPainter:plot'
             frame[s]   = 0.10
             delta[s]   = 0.00
             stream#1
             id = 0
             rate[hz]   = 48000.00
             dim        = 1
             bytes      = 4
             type       = FLOAT
[thread____] start 'SignalPainter'
[thread____] start 'VADEX' (single run)
[consume___] start 'SignalPainter:plot1'
             frame[s]   = 0.17
             delta[s]   = 0.00
             stream#1
             id = 6
             rate[hz]   = 6.00
             dim        = 5
             bytes      = 4
             type       = FLOAT
[thread____] start 'XMLEventSender'
[thread____] start 'XML' (single run)
[thread____] start 'noname'
[consume___] start 'XMLEventSender:monitor'
             frame[s]   = 0.17
             delta[s]   = 0.00
             stream#1
             id = 6
             rate[hz]   = 6.00
             dim        = 5
             bytes      = 4
             type       = FLOAT

             seconds to start: ok

[pipeline__] start
[thread____] start 'Pipeline' (single run)

             press enter to stop

How to run this script on MAC?

I tried to run the script on MacOS and encountered following error.
Please help me as i'm new to MacOS.

Traceback (most recent call last):

File "vad_extract.py", line 138, in
extract_voice(args.model, args.files, n_batch=args.n_batch)
File "vad_extract.py", line 55, in extract_voice
if not all([os.path.exists(checkpoint_path + x) for x in ['.data-00000-of-00001', '.index', '.meta']]):
File "vad_extract.py", line 55, in
if not all([os.path.exists(checkpoint_path + x) for x in ['.data-00000-of-00001', '.index', '.meta']]):
NameError: free variable 'checkpoint_path' referenced before assignment in enclosing scope

Does it works for indoors voice detection?

I would like to distinguish the human voice and indoor small noises such as backgroud talk from colleges, footsteps, does this code will work according to your experience?

Using do_vad_extract.cmd is it possible to get just time on commandline?

instead of getting two separate files of voice and noise, can i just get only timing when there is noise and when there is voice along with values ? i just want to get time in the commandline. How to do that ?

like in do_vad_live.cmd we get xml output in the udp, but i dont want to use socket. i just need the timing when there is voice and when there is noise along with values. please help me out.