Comments (6)
Hey @ENDERFUN2 ,
Assume you have a box full of Legos and want to construct a spacecraft. To be used, all of the Legos should be loose and in the box.However, occasionally, a smaller box within the larger box may contain the Legos. This extra box is merely there; it contains no Legos.The sounddevice.rec function in this code is analogous to obtaining a Lego box. There could be an additional, empty box included (the extra dimension).It's like opening the large box and removing the smaller, empty one when you use the squeeze feature. It takes out the superfluous box so that all you have to work with is the audio data, or Legos.This is significant because the WhisperModel.transcribe method, which you use to construct the spaceship, is limited to working with loose Legos and not with boxes inside boxes. Squeezing ensures that everything operates as intended and removes the excess box.
This is as how I understood the squeeze works.
from faster-whisper.
@ENDERFUN2 , hello. To handle silence in recorded audio, you can try using the vad_filter
option.
To avoid saving audio to temp file, you should pass the audio data as numpy ndarray format to FW model. Below is my example:
import numpy as np
import sounddevice as sd
from faster_whisper import WhisperModel
print("Recording started")
duration = 10
sample_rate = 16000
audio_data = sd.rec(
int(sample_rate * duration), samplerate=sample_rate, channels=1, dtype=np.float32
)
sd.wait()
audio_data = audio_data.squeeze()
print("Recording stopped")
model = WhisperModel("tiny", device="cpu")
segments, info = model.transcribe(audio_data, word_timestamps=True)
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
from faster-whisper.
Also along with it @ENDERFUN2 make sure you use 16000 as sampling rate and not any other as FW wouldn't support a sample rate other than 16000 as told by @trungkienbkhn.
from faster-whisper.
Okay, so vad_filter works perfectly. I don't know why I hadn't found it before...
Also, that sample code from @trungkienbkhn turned out to be a game changer. But, as a curious man, why is squeeze required?It's my first serious project in Python, 'cause all my previous were in Java or C++, therefore I don't really understand it. And why sample rate has to be set to 16000? When I pass the audio file with 48000 sample rate, it transcribes the audio 100% perfect. Would so grateful for explanaition
from faster-whisper.
@ENDERFUN2 , FYI, you can see this comment to better understand why should use sample_rate=16000. If I use sr=48000 in my example in here, obviously it doesn't work.
For why use squeeze(), sd.rec() func returns an array with shape (duration * sample_rate, 1)
because it records mono audio, resulting in a 2D array with one of the dimensions having size 1. However, FW requires input as a 1D array. So we need use squeeze() to reformat.
from faster-whisper.
Well, it now makes a lot of sense. Thank you both @trungkienbkhn @arunman1kandan for your service, although abstracting it to Legos wasn't necessary, I just didn't understand ndarrays. Also, after some refactoring I noticed that my code is one big pile of garbage and I should reformat it asap. Your answers gave me an important insight
from faster-whisper.
Related Issues (20)
- Better chunking/loading HOT 3
- Could you provide the source of models before conversion
- how to use on cudnn9.1.0 HOT 2
- Updated benchmarks please! HOT 11
- different transcribe results with same whisper model and same audios in same process
- Are there any plans to publish the latest code to pypi? HOT 1
- Medium model output is nonsense for batched pipeline (for short 15s audio clips) HOT 9
- Complained that 'Unable to open file model.bin in model' when loading a model folder with 'model.safetensors' HOT 1
- CUDA initialization error for faster whisper in subprocess
- Possible to abort transcription? HOT 1
- Very bigger segment (chunk) size (almost 30 second each) with BatchedInferencePipeline HOT 4
- Choice of model output langauge of easiset way to do that?
- [Hallucinations] Repetition of words or chunks with own fine-tuned model
- faster-whisper vs whisper: PyAV stops during decode, ffmpeg continues HOT 2
- Duplicate sentences and missing sentences in large-v3
- a simple web-ui for whisper
- Cyrillic letters in Polish transcription
- Memory on GPU not cleared after transcription HOT 4
- Fair Benchmarking of Faster-Whisper - Parameter equivalents to Hugginface HOT 2
- CUDA failed with error an illegal memory access was encountered
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from faster-whisper.