Coder Social home page Coder Social logo

hcmlab / vadnet Goto Github PK

View Code? Open in Web Editor NEW
422.0 20.0 77.0 107.87 MB

Real-time Voice Activity Detection in Noisy Eniviroments using Deep Neural Networks

Home Page: http://openssi.net

License: GNU Lesser General Public License v3.0

Batchfile 3.55% Python 96.45%

vadnet's Introduction

VadNet

VadNet is a real-time voice activity detector for noisy enviroments. It implements an end-to-end learning approach based on Deep Neural Networks. In the extended version, gender and laughter detection are added. To see a demonstration click on the images below.

Platform

Windows

Dependencies

Visual Studio 2015 Redistributable (https://www.microsoft.com/en-us/download/details.aspx?id=52685)

Installation

do_bin.cmd - Installs embedded Python and downloads SSI interpreter. During the installation the script tries to detect if a GPU is available and possibly installs the GPU version of tensorflow. This requires that a NVIDIA graphic card is detected and CUDA has been installed. Nevertheless, VadNet does fine on a CPU.

Quick Guide

do_vad[ex].cmd - Demo on pre-recorded files (requires 48k mono wav files)

do_vad[ex]_live.cmd - Capture from microphone and stream results to a socket

do_vad[ex]_loopback.cmd - Instead of a microphone capture from soundcard (loopback mode, see comments below)

do_vad_extract.cmd - Separates audio file into noisy and voiced parts (supports any audio format)

train\do_all.cmd - Performs a fresh training (downloads media files, creates annotations and trains a new network)

Documentation

VadNet is implemented using the Social Signal Interpretation (SSI) framework. The processing pipeline is defined in vad[ex].pipeline and can be configured by editing vad[ex].pipeline-config. Available options are:

audio:live = false                   # $(bool) use live input (otherwise read from file)
audio:live:mic = true                # $(bool) if live input is selected use microphone (otherwise use soundcard)
model:path=models\vad                # path to model folder
send:do = false                      # $(bool) stream detection results to a socket
send:url = upd://localhost:1234      # socket address in format <protocol://host:port>
record:do = false                    # $(bool) capture screen and audio
record:path = capture                # capture path

If the option send:do is turned on, an XML string with the detection results is streamed to a socket (see send:url). You can change the format of the XML string by editing vad.xml. To run SSI in the background, click on the tray icon and select 'Hide windows'. For more information about SSI pipelines please consult the documentation of SSI.

The Python script vad_extract.py can be used to separate noisy and voiced parts of an audio file. For each input file <name>.<ext> two new files <name>.speech.wav and <name>.noise.wav will be generated. The script should handle all common audio formats. You can run the script from the command line by calling > bin\python.exe vad_extract.py <arguments>:

usage: vad_extract.py [-h] [--model MODEL] [--files FILES [FILES ...]] [--n_batch N_BATCH]

optional arguments:
  -h, --help            		show this help message and exit
  --model MODEL         		path to model
  --files FILES [FILES ...]		list of files
  --n_batch N_BATCH     		number of batches

Loopback Mode

In loopback mode, whatever you playback through your soundcard will be analysed. Before using it please set the right output format for your soundcard. To do so, go to the Sound settings in the control panel, select your default playback device and click on Properties. Most devices will now allow you to set a default format. Choose 16 bit, 48000 Hz and press OK.

Insights

The model we are using has been trained with Tensorflow. It takes as input the raw audio input and feeds it into a 3-layer Convolutional Network. The result of this filter operation is then processed by a 2-layer Recurrent Network containing 64 RLU cells. The final bit is a fully-connected layer, which applies a softmax and maps the input to a tuple <noise, voice> in the range [0..1].

Network architecture:

We have trained the network on roughly 134 h of audio data (5.6 days) and run training for 25 epochs (381024 steps) using a batch size of 128.

Filter weights learned in the first CNN layer:

Some selected activations of the last RNN layer for an audio sample containing music and speech:

Activations for all cells in the last RNN layer for the same sample:

Credits

License

VadNet is released under LGPL (see LICENSE).

Publication

If you use VadNet in your work please cite the following paper:

@InProceedings{Wagner2018,
  author =    {Johannes Wagner and Dominik Schiller and Andreas Seiderer and Elisabeth Andr\'e},
  title =     {Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?},
  booktitle = {Proceedings of Interspeech},
  address =   {Hyderabad, India},
  year =      {2018},
  pages =     {147--151}
}

Author

Johannes Wagner, Lab for Human Centered Multimedia, 2018

vadnet's People

Contributors

frankenjoe avatar tobiasbaur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vadnet's Issues

List index out of range

Help me

I try run python code/annotation.py but:

Traceback (most recent call last):
File "code/annotation.py", line 84, in
sub_path = glob.glob(r'{}..'.format(path))[0]
IndexError: list index out of range

What happened?

I have some problems with this project

Hello ,author!
First of all, thank you very much for providing me with the ideas I realized.Then I have some questions:

  1. I have noticed that the neural network makes a classification decision each 1 second of audio,but It is possible to include speech and noise in one second, such as 30% noise and 70 voice, how to distinguish them?
  2. If a voice lasts for 1.2 seconds, the next 0.2 seconds of vocals may be classified as noise, resulting in incomplete speech segments, so how to solve this problem?
  3. I want to reduce the classification time, such as 500ms or 250ms, then whether to separate the training speech and noise into a file size of 500ms or 250ms, and then retrain a new model, so will it lead to a decline in the recognition rate?

I am looking forward to your answer, thank you again.

Use as Music Detector

Hello,

first of all, congratulations for this project. The results I got using this library really surprised me. Thanks!

I was wondering if it would easy to detect where music segments appears in the "noise file" because I realized that radio or TV tunes are always classificated as noise (as it should be) but I am also interested in getting these parts.

Any suggestion would be appreciated! Thnx

how to train it with my data

Thanks for the project.
I have found the official source train data comes from http://verteiler5.mediathekview.de/Filmliste-akt.xz in train/code/playlist.py download_list function.
I have downloaded the Filmliste-akt.xz , but I can't figure out what it is. So, can you give me some details about your data?
And , how could I train it with my data (not from youtube) ? In my thought, I should prepare two kind of audio (voice and noise) , is that enough?

the model.py could not be loaded

Dear sir
according to the attached image, the model.py could not be loaded. so when the vadexlive is started, the output is not normal due to lack of loading
251
the model.
251g

25

Make a decision each X second or ms

Hello again,

After some days off I have returned to this project and I have some new questions.
I have noticed that the neural network makes a classification decision each 1 second of audio,
so I was wondering if I could decrease this interval to a smaller value. For example, classifying each 250ms of audio into Noise/Voice and have a better "precision" when discriminating these classes.

The script I am using is vad_extract.py. I know the input layer receives the samples of each second, make the prediction and then store the probabilities in the labels variable. So, as I'm thinking, the approach will be change the size of the input and final layers to receive samples of, for example, 250ms, make a prediction and stored the probabilies for each 250ms unit in labels. Am I going well? As you did it each second, do you think that discriminate noise/voice in smaller audio units is a good idea?

Btw, I don't know if 'issues' is the best apart for these kind of questions because actually it's not a problem (in fact, your project was really friendly to install and use), I am sorry if you consider it's not.

Regards,
Ana

Is it possible to use this in platform Linux?

Hello, i would like to know if it is possible to use this application in platform Linux by simply changing the cmd files?
Another question is that "Does the CPU works for the Online VAD?"
Thanks for your time,

Comparison against other approaches?

Hi, I have just been trying this lib out and it seems to work very well, much better than WebRTC or other available frameworks that I have tried, and is extremely fast (0.05xRT on average) but my tests have still been largely adhoc. I went through your Interspeech paper:

  • Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?

but I could not find any direct comparison against other methods. Do you perhaps have any other publications or results that address this topic?

About frame size

Thanks for this project @frankenjoe
I have noticed that the frame size set to 1 sec.
So in this file

[0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1
 1 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 0 0 1 0 0
 1 1 1 1 0 0 0 0 0]

I have 194 labels noise/speech since the file is 194 seconds.
Is it possible to set an arbitrary size to the frame size, specifically under the reference frame of 1sec? Like 0.25 sec?

Thank you.

No module named 'encodings'

hello after runing the do_cmd I got the following error:
Current thread 0x000119dc (most recent call first):
Fatal Python error: initfsencoding: unable to load the file system codec
ModuleNotFoundError: No module named 'encodings'
any suggestion?

Can't find playlist module

Hello,
I try to us the download.py file oncode folder. I can't find playlist module on my computer and when I try to install it I get that message:

Collecting playlist
  Could not find a version that satisfies the requirement playlist (from versions: )
No matching distribution found for playlist

lmza problem

python3.7 code/download.py
download http://verteiler5.mediathekview.de/Filmliste-akt.xz
Traceback (most recent call last):
File "code/download.py", line 147, in
download_zdf_serien()
File "code/download.py", line 132, in download_zdf_serien
download_list()
File "/home/giuser/vadnet/train/code/playlist.py", line 28, in download_list
data = lzma.decompress(data)
File "/usr/local/lib/python3.7/lzma.py", line 334, in decompress
res = decomp.decompress(data)
_lzma.LZMAError: Input format not supported by decoder

A issue about vad_extract.py

Hello, thank you for your outstanding work.
I encountered the following issue when using the vad_extract.py provided in your code:

Traceback (most recent call last):
  File "/private/CPJ/vadnet-master/vad_extract.py", line 141, in <module>
    extract_voice(args.model, args.files, n_batch=args.n_batch)
  File "/private/CPJ/vadnet-master/vad_extract.py", line 98, in extract_voice
    sess.run(init, feed_dict = { x : input, y : labels, ph_n_shuffle : 1, ph_n_repeat : 1, ph_n_batch : n_batch})
  File "/private/CPJ/anaconda3/envs/vad/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 969, in run
    run_metadata_ptr)
  File "/private/CPJ/anaconda3/envs/vad/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1192, in _run
    feed_dict_tensor, options, run_metadata)
  File "/private/CPJ/anaconda3/envs/vad/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1372, in _do_run
    run_metadata)
  File "/private/CPJ/anaconda3/envs/vad/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1397, in _do_call
    raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

Detected at node 'ds/initializer' defined at (most recent call last):
Node: 'ds/initializer'
AddDataset: Failed to build BatchDataset op with error More Input() calls than the 2 input_args while building NodeDef 'BatchDataset/_14' using Op<name=BatchDataset; signature=input_dataset:variant, batch_size:int64 -> handle:variant; attr=output_types:list(type),min=1; attr=output_shapes:list(shape),min=1; attr=metadata:string,default="">
	 [[{{node ds/initializer}}]]

Original stack trace for 'ds/initializer':

Here it prompts that two extra parameters are given, but I executed it according to the way you provided in the readme. What could be the reason for this? Looking forward to your reply.

Behaviour when classificating voice+noise audio.

Hello,

what is exactly the behaviour of the algorithm when it is trying to classificate a chunk with voice over noise? Does it depend on the "volume" of the noise? Or it depends on the class of samples around the chunk? How can I know that the chunk is not purely voice or noise but a mix of both?

Regards!
Ana

black screen on VADEX window

can get it running but the screen goes black:
image
this is the log of the prompt.


Built with Social Signal Interpretation (SSI)

(c) 2007-19 University of Augsburg, Lab for Human Centered Multimedia
Johannes Wagner, Tobias Baur, Florian Lingenfelser, Andreas Seiderer, Simon Flutura, Dominik Schiller, Ionut Damian

website: http://openssi.net
contact: [email protected]

build version: v1.0.4

download source=https://github.com/hcmlab/ssi/raw/master/bin/x64/vc140
download target=C:\Users\shini\Documents\GitHub\vadnet\bin\

[factory___] found 'Console'
[factory___] create instance of 'Console'
[factory___] store instance of 'Console' as 'console'
[factory___] load 'ssiframe.dll'
[factory___] found 'TheFramework'
[factory___] found 'XMLPipeline'
[factory___] found 'Asynchronous'
[factory___] found 'EventConsumer'
[factory___] found 'Clone'
[factory___] found 'Chain'
[factory___] found 'Cast'
[factory___] found 'Selector'
[factory___] found 'Merge'
[factory___] found 'Inverter'
[factory___] found 'Decorator'
[factory___] load 'ssievent.dll'
[factory___] found 'TheEventBoard'
[factory___] found 'EventMonitor'
[factory___] found 'MapEventSender'
[factory___] found 'TupleEventSender'
[factory___] found 'StringEventSender'
[factory___] found 'ZeroEventSender'
[factory___] found 'ThresEventSender'
[factory___] found 'ThresTupleEventSender'
[factory___] found 'TriggerEventSender'
[factory___] found 'FixationEventSender'
[factory___] found 'ThresClassEventSender'
[factory___] found 'XMLEventSender'
[factory___] found 'ClockEventSender'
[factory___] found 'EventToStream'
[factory___] create instance of 'XMLPipeline'
[factory___] create instance of 'TheFramework'
[factory___] create instance of 'TheEventBoard'
[factory___] store instance of 'XMLPipeline' as 'xmlpipe'
[framexml__] load 'C:\Users\shini\Documents\GitHub\vadnet\vadex.pipeline' (local config=yes, global config=yes)
[framexml__] apply config from 'vadex.pipeline-config'
             live -> false
             live:mic -> true
             path -> data\group.wav
             frame -> 8000
             delta -> 40000
             send:do -> false
             send:url -> upd://localhost:1234
             record:do -> false
             record:path -> capture
[framexml__] apply config from 'C:\Users\shini\Documents\GitHub\vadnet\vadex.pipeline-config'
             live -> false
             live:mic -> true
             path -> data\group.wav
             frame -> 8000
             delta -> 40000
             send:do -> false
             send:url -> upd://localhost:1234
             record:do -> false
             record:path -> capture
[factory___] load 'ssigraphic.dll'
[factory___] found 'VideoPainter'
[factory___] found 'SignalPainter'
[factory___] found 'EventPainter'
[factory___] found 'PointsPainter'
[factory___] load 'ssiaudio.dll'
[factory___] found 'Audio'
[factory___] found 'AudioPlayer'
[factory___] found 'STKAudioMixer'
[factory___] found 'AudioLoopBack'
[factory___] found 'AudioActivity'
[factory___] found 'VoiceActivitySender'
[factory___] found 'VoiceActivityVerifier'
[factory___] found 'AudioIntensity'
[factory___] found 'AudioLpc'
[factory___] found 'AudioConvert'
[factory___] found 'SNRatio'
[factory___] found 'WavReader'
[factory___] found 'WavWriter'
[factory___] found 'WavProvider'
[factory___] found 'PreEmphasis'
[factory___] found 'AudioMono'
[factory___] found 'AudioNoiseGate'
[factory___] load 'ssiioput.dll'
[factory___] found 'MemoryWriter'
[factory___] found 'FileReader'
[factory___] found 'FileWriter'
[factory___] found 'FileEventWriter'
[factory___] found 'SocketReader'
[factory___] found 'SocketWriter'
[factory___] found 'SocketEventWriter'
[factory___] found 'SocketEventReader'
[factory___] found 'FileSampleWriter'
[factory___] found 'FileAnnotationWriter'
[factory___] found 'FakeSignal'
[factory___] found 'NotifyReceiver'
[factory___] found 'NotifySender'
[factory___] load 'ssisignal.dll'
[factory___] found 'MFCC'
[factory___] found 'Energy'
[factory___] found 'Intensity'
[factory___] found 'Functionals'
[factory___] found 'FunctionalsEventSender'
[factory___] found 'Derivative'
[factory___] found 'Integral'
[factory___] found 'Butfilt'
[factory___] found 'IIR'
[factory___] found 'Spectrogram'
[factory___] found 'DownSample'
[factory___] found 'Normalize'
[factory___] found 'MvgAvgVar'
[factory___] found 'MvgMinMax'
[factory___] found 'MvgNorm'
[factory___] found 'MvgPeakGate'
[factory___] found 'MvgDrvtv'
[factory___] found 'MvgConDiv'
[factory___] found 'MvgMedian'
[factory___] found 'Pulse'
[factory___] found 'Multiply'
[factory___] found 'Noise'
[factory___] found 'FFTfeat'
[factory___] found 'ConvPower'
[factory___] found 'Expression'
[factory___] found 'Limits'
[factory___] found 'Gate'
[factory___] found 'Bundle'
[factory___] found 'Statistics'
[factory___] found 'Sum'
[factory___] found 'Relative'
[factory___] found 'Mean'
[factory___] load 'ssipython36.dll'
[pymanager_] init
[factory___] found 'PythonTransformer'
[factory___] found 'PythonFeature'
[factory___] found 'PythonFilter'
[factory___] found 'PythonConsumer'
[factory___] found 'PythonObject'
[factory___] found 'PythonSensor'
[factory___] found 'PythonImageFilter'
[factory___] found 'PythonImageFeature'
[factory___] found 'PythonImageConsumer'
[factory___] found 'PythonModel'
[factory___] load 'ssicontrol.dll'
[factory___] found 'ControlSlider'
[factory___] found 'Controller'
[factory___] found 'ControlCheckBox'
[factory___] found 'ControlTextBox'
[factory___] found 'ControlButton'
[factory___] found 'ControlGrid'
[factory___] found 'ControlEvent'
[factory___] found 'WaitButton'
[factory___] create instance of 'Audio'
[factory___] store instance of 'Audio' as 'noname003'
[provider__] init 'Provider:audio'
             id         = 0
             rate[hz]   = 48000.00
             dim        = 1
             bytes      = 4
             type       = FLOAT
             buffer[s]  = 10.00
             watch      = 1.0s
             sync       = 5.0s
[factory___] create instance of 'PythonFeature'
[factory___] store instance of 'PythonFeature' as 'noname004'
[pyhelper__] new sys path '['C:\\Users\\shini\\Documents\\GitHub\\vadnet\\bin\\python36', 'C:\\Users\\shini\\Documents\\GitHub\\vadnet\\bin', 'C:\\Users\\shini\\Documents\\GitHub\\vadnet\\bin\\lib\\site-packages', 'C:\\Users\\shini\\Documents\\GitHub\\vadnet']'
[pyhelper__] loading script 'model.py'
C:\Users\shini\Documents\GitHub\vadnet\bin\lib\site-packages\tensorflow\python\framework\dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\shini\Documents\GitHub\vadnet\bin\lib\site-packages\tensorflow\python\framework\dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\shini\Documents\GitHub\vadnet\bin\lib\site-packages\tensorflow\python\framework\dtypes.py:521: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\shini\Documents\GitHub\vadnet\bin\lib\site-packages\tensorflow\python\framework\dtypes.py:522: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\shini\Documents\GitHub\vadnet\bin\lib\site-packages\tensorflow\python\framework\dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\shini\Documents\GitHub\vadnet\bin\lib\site-packages\tensorflow\python\framework\dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
[pyhelper__] found function 'getOptions'
[pyhelper__] found function 'getSampleDimensionOut'
[pyhelper__] found function 'getSampleTypeOut'
[pyhelper__] found function 'transform_enter'
[pyhelper__] found function 'transform'
[pyhelper__] found function 'transform_flush'
[eboard____] 'noname004' sends '(null)'
load model  models\vad
loading model models\vad\model.ckpt-200106
2021-03-11 00:27:00.315819: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2021-03-11 00:27:00.476991: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:01:00.0
totalMemory: 24.00GiB freeMemory: 19.97GiB
2021-03-11 00:27:00.486038: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2021-03-11 00:29:43.391704: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-11 00:29:43.397844: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929]      0
2021-03-11 00:29:43.402047: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0:   N
2021-03-11 00:29:43.405656: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19374 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
[factory___] create instance of 'PythonFeature'
[factory___] store instance of 'PythonFeature' as 'noname005'
[pyhelper__] loading script 'model.py'
[pyhelper__] found function 'getOptions'
[pyhelper__] found function 'getSampleDimensionOut'
[pyhelper__] found function 'getSampleTypeOut'
[pyhelper__] found function 'transform_enter'
[pyhelper__] found function 'transform'
[pyhelper__] found function 'transform_flush'
[eboard____] 'noname005' sends '(null)'
load model  models\gender
loading model models\gender\model.ckpt-675023
2021-03-11 00:30:41.069534: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2021-03-11 00:30:41.073864: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-11 00:30:41.080212: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929]      0
2021-03-11 00:30:41.083646: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0:   N
2021-03-11 00:30:41.090013: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19374 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
[factory___] create instance of 'PythonFeature'
[factory___] store instance of 'PythonFeature' as 'noname006'
[pyhelper__] loading script 'model.py'
[pyhelper__] found function 'getOptions'
[pyhelper__] found function 'getSampleDimensionOut'
[pyhelper__] found function 'getSampleTypeOut'
[pyhelper__] found function 'transform_enter'
[pyhelper__] found function 'transform'
[pyhelper__] found function 'transform_flush'
[eboard____] 'noname006' sends '(null)'
load model  models\laughter
loading model models\laughter\model.ckpt-830054
2021-03-11 00:30:41.771476: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2021-03-11 00:30:41.774499: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-11 00:30:41.779262: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929]      0
2021-03-11 00:30:41.782795: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0:   N
2021-03-11 00:30:41.785663: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19374 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
[factory___] create instance of 'Merge'
[factory___] store instance of 'Merge' as 'noname007'
[factory___] create instance of 'MvgAvgVar'
[factory___] store instance of 'MvgAvgVar' as 'noname008'
[factory___] create instance of 'PythonFilter'
[factory___] store instance of 'PythonFilter' as 'noname009'
[pyhelper__] loading script 'vadex.py'
[pyhelper__] found function 'getOptions'
[pyhelper__] found function 'getSampleDimensionOut'
[pyhelper__] found function 'getSampleTypeOut'
[pyhelper__] found function 'transform_enter'
[pyhelper__] found function 'transform'
[pyhelper__] found function 'transform_flush'
[eboard____] 'noname009' sends '(null)'
[factory___] create instance of 'SignalPainter'
[factory___] store instance of 'SignalPainter' as 'plot'
[factory___] create instance of 'SignalPainter'
[factory___] store instance of 'SignalPainter' as 'plot1'
[factory___] create instance of 'XMLEventSender'
[factory___] store instance of 'XMLEventSender' as 'monitor'
[eboard____] 'monitor' sends 'final@xml'
[factory___] create instance of 'SocketEventWriter'
[factory___] store instance of 'SocketEventWriter' as 'noname013'
[eboard____] 'noname013' receives 'final@xml'
[factory___] create instance of 'Decorator'
[factory___] store instance of 'Decorator' as 'noname014'
[eboard____] start event board worker
[socketudp_] connect to '127.0.0.1:1234' [udp]
[esockwrite] start sending events to 'udp://127.0.0.1:1234'
[thread____] start 'EventBoardWorker'
[pipeline__] start 0 threads
[thread____] start 'audio'
[provider__] start 'Provider:audio'
[sensor____] connect 'Audio'
[sensor____] start 'Audio'
[thread____] start 'Audio@Microphone (NVIDIA Broadcast)'
[thread____] start 'PythonFeature'
[transform_] start 'PythonFeature:noname004'
             frame[s]   = 0.17
             delta[s]   = 0.83
             id         = 0 -> 1
             rate[hz]   = 48000.00 -> 6.00
             dim        = 1 -> 2
             bytes      = 4 -> 4
             type       = FLOAT -> FLOAT
             buffer[s]  = 10.00
[thread____] start 'PythonFeature'
[transform_] start 'PythonFeature:noname005'
             frame[s]   = 0.17
             delta[s]   = 0.83
             id         = 0 -> 2
             rate[hz]   = 48000.00 -> 6.00
             dim        = 1 -> 3
             bytes      = 4 -> 4
             type       = FLOAT -> FLOAT
             buffer[s]  = 10.00
[thread____] start 'PythonFeature'
[transform_] start 'PythonFeature:noname006'
             frame[s]   = 0.17
             delta[s]   = 0.83
             id         = 0 -> 3
             rate[hz]   = 48000.00 -> 6.00
             dim        = 1 -> 2
             bytes      = 4 -> 4
             type       = FLOAT -> FLOAT
             buffer[s]  = 10.00
[thread____] start 'Merge'
[transform_] start 'Merge:noname007'
             frame[s]   = 0.17
             delta[s]   = 0.00
             id         = 1 -> 4
             rate[hz]   = 6.00 -> 6.00
             dim        = 2 -> 7
             bytes      = 4 -> 4
             type       = FLOAT -> FLOAT
             buffer[s]  = 10.00
[thread____] start 'MvgAvgVar'
[transform_] start 'MvgAvgVar:noname008'
             frame[s]   = 0.17
             delta[s]   = 0.00
             id         = 4 -> 5
             rate[hz]   = 6.00 -> 6.00
             dim        = 7 -> 7
             bytes      = 4 -> 4
             type       = FLOAT -> FLOAT
             buffer[s]  = 10.00
[thread____] start 'PythonFilter'
[transform_] start 'PythonFilter:noname009'
             frame[s]   = 0.17
             delta[s]   = 0.00
             id         = 5 -> 6
             rate[hz]   = 6.00 -> 6.00
             dim        = 7 -> 5
             bytes      = 4 -> 4
             type       = FLOAT -> FLOAT
             buffer[s]  = 10.00
[thread____] start 'SignalPainter'
[thread____] start 'AUDIO' (single run)
[consume___] start 'SignalPainter:plot'
             frame[s]   = 0.10
             delta[s]   = 0.00
             stream#1
             id = 0
             rate[hz]   = 48000.00
             dim        = 1
             bytes      = 4
             type       = FLOAT
[thread____] start 'SignalPainter'
[thread____] start 'VADEX' (single run)
[consume___] start 'SignalPainter:plot1'
             frame[s]   = 0.17
             delta[s]   = 0.00
             stream#1
             id = 6
             rate[hz]   = 6.00
             dim        = 5
             bytes      = 4
             type       = FLOAT
[thread____] start 'XMLEventSender'
[thread____] start 'XML' (single run)
[thread____] start 'noname'
[consume___] start 'XMLEventSender:monitor'
             frame[s]   = 0.17
             delta[s]   = 0.00
             stream#1
             id = 6
             rate[hz]   = 6.00
             dim        = 5
             bytes      = 4
             type       = FLOAT

             seconds to start: ok

[pipeline__] start
[thread____] start 'Pipeline' (single run)

             press enter to stop


How to run this script on MAC?

I tried to run the script on MacOS and encountered following error.
Please help me as i'm new to MacOS.

Traceback (most recent call last):

File "vad_extract.py", line 138, in
extract_voice(args.model, args.files, n_batch=args.n_batch)
File "vad_extract.py", line 55, in extract_voice
if not all([os.path.exists(checkpoint_path + x) for x in ['.data-00000-of-00001', '.index', '.meta']]):
File "vad_extract.py", line 55, in
if not all([os.path.exists(checkpoint_path + x) for x in ['.data-00000-of-00001', '.index', '.meta']]):
NameError: free variable 'checkpoint_path' referenced before assignment in enclosing scope

Does it works for indoors voice detection?

I would like to distinguish the human voice and indoor small noises such as backgroud talk from colleges, footsteps, does this code will work according to your experience?

Using do_vad_extract.cmd is it possible to get just time on commandline?

instead of getting two separate files of voice and noise, can i just get only timing when there is noise and when there is voice along with values ? i just want to get time in the commandline. How to do that ?

like in do_vad_live.cmd we get xml output in the udp, but i dont want to use socket. i just need the timing when there is voice and when there is noise along with values. please help me out.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.