Comments (1)
Use live audio stream instead of static audio files
Yes it's possible and I had plans to implement it but no time to allocate to it (it's not trivial). Might be something I do in the future but no promises.
See the offline (as in the non-streaming approach) code here. To make it online (i.e. streaming), you'll need the following:
- A threading or async mechanism to both continuously record audio and derive embeddings. From an implementation point of view, this is the hardest bit. You'll want to keep recording audio without interruption and at the same time periodically generate partial embeddings. Your implementation will be dependent on whatever library you use for recording audio and what mechanism it provides to inform you of the length of the audio currently recorded. I can recommend sounddevice, although I am unsure of the specifics for this problem yet.
- To rewrite
compute_partial_slices
to yield partial segments in an online fashion. The good news is that your function should be simpler than mine, because you don't have to care about the coverage of the last partial. You'll probably not need to return the wav partials either, unless needed for debugging or visualizations. - Finally you can use the remainder of
embed_utterance
to compute your embeds. Your batch size should be proportional to the maximum latency you can afford. If your application can have up to a 1s delay, then your batch size can be the number of partials in one second. The fastest solution will have a batch size of 1, but it will be more compute-intensive. You should also determine whether you want to be doing inference on CPU or GPU. The voice encoder is a fairly light model and so CPU inference may even be faster than GPU inference for small batch sizes (due to the time it takes to move the data) but maybe doing inference on the CPU will be problematic if you're recording audio at the same time.
Number of speakers unknown in the beginning of the audio stream.
See #10, a very similar thread.
Pre-trained english model for diarization.
Unfortunately diarization in general is not in my scope, I don't know what the SOTA is.
from resemblyzer.
Related Issues (20)
- Install issue HOT 1
- End-to-end trainable model HERE
- Async and parallel processing
- After diarization, The timestamps I got are irrelevants from original file HOT 3
- The Ability of Speaker Diarization with More Than 2 Speakers HOT 3
- Speaker Verification HOT 2
- Error while trying to plot speaker similarity in Resemblyzer
- Diarization Graph Issue HOT 1
- About pre-trained model HOT 2
- How to get time stamps for every speaker change?
- MATPLOTLIB FIGURES NOT DISPLAYING..How to solve this please help.. HOT 2
- New release plans to include latest changes? HOT 5
- Error while using Resemblyzer HOT 2
- Getting different speaker embeddings from same wav file on different machine HOT 2
- Speech Diarization
- Using VoiceEncoder as loss with time domain models
- AttributeError: module 'numpy' has no attribute 'bool'.
- What license does resemblyzer fall under? HOT 1
- RuntimeError: cuDNN error: CUDNN_STATUS_VERSION_MISMATCH
- is this speaker embedding better than X-vector?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from resemblyzer.