Comments (3)
Hi @liushenme, the speech samples do not need to be resampled since, as @yellowyi9527 pointed out, the length of the FFT window is adjusted according to the sampling frequency and then zero-padded to a fixed length of 4096 samples. This is done in line 2301: win_length = int(sr * win_length)
, where the initial win_length is 0.02 (20 ms). Because of this approach, resampling of the signals is not needed.
You are right that many of the Mel bands will be empty for a wideband or narrowband signal. However, the NISQA model predicts quality in a fullband context, so it will predict a lower score when a speech signal is missing high frequencies. For a wideband signal (up to 8 kHz), the score will on average only be slightly lower than a fullband signal. The score of a clean narrowband signal (up to 4 kHz) should on average be around a MOS of 3.8. The figure below shows the average predicted MOS vs the cutoff frequency of a lowpass.
If Librosa prints out a warning message about the empty Mel bands you can just ignore it. Basically, the model needs to know that these frequencies are empty since it expects a fullband signal.
from nisqa.
He's using Libosa to read the data, and it's going to automatically go to the specified frequency。in NISQA_lib.py 2283line (y, sr = lb.load(file_path, sr=sr))@liushenme
from nisqa.
Thanks for your reply. I understand that the author wants to resample all the speech samples to the same frequency, but after resample operation, speech samples may add many frequency dimensions with a value of 0. Is it reasonable?
And for NISQA Corpus, the speech samples are 48k sampling and their true frequency distribution is only 0-8k ( some is 0-4k ). They already have many frequency dimensions with a value of 0. I don‘t think it is reasonable to process these speech sample directly. @yellowyi9527
from nisqa.
Related Issues (20)
- Continuous metrics? HOT 4
- TTS naturalness prediction based on which model
- Could you tell me if the MOS rating is objective or subjective? HOT 4
- Full Reference or No Reference When Subjectively Rating a Speech HOT 1
- upper bound and larger bound inconsistent with step sign HOT 1
- Audio input requirements HOT 2
- pip package HOT 4
- CDUA device does not load the model HOT 2
- Interpertation of Different metrics HOT 1
- The predict result seems not reliable HOT 3
- License
- upper bound and larger bound inconsistent with step sign
- It seams slowly because of some functions running on CPU
- max window length error for most audio files HOT 1
- Utilizing finetuned weights
- mos_pred预测分数有些奇怪
- MOS prediction during runtime
- Error in loading the files using the folder name
- Why the CUDA is getting out of memory
- question about MOS_pred of synthetic speech
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nisqa.