Hi, I am trying to use the NISQA Corpus dataset published by you. I

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Some questions about the sample rate of speech samples about nisqa HOT 3 CLOSED

liushenme commented on July 21, 2024

Some questions about the sample rate of speech samples

from nisqa.

Comments (3)

gabrielmittag commented on July 21, 2024 1

Hi @liushenme, the speech samples do not need to be resampled since, as @yellowyi9527 pointed out, the length of the FFT window is adjusted according to the sampling frequency and then zero-padded to a fixed length of 4096 samples. This is done in line 2301: win_length = int(sr * win_length), where the initial win_length is 0.02 (20 ms). Because of this approach, resampling of the signals is not needed.

You are right that many of the Mel bands will be empty for a wideband or narrowband signal. However, the NISQA model predicts quality in a fullband context, so it will predict a lower score when a speech signal is missing high frequencies. For a wideband signal (up to 8 kHz), the score will on average only be slightly lower than a fullband signal. The score of a clean narrowband signal (up to 4 kHz) should on average be around a MOS of 3.8. The figure below shows the average predicted MOS vs the cutoff frequency of a lowpass.

If Librosa prints out a warning message about the empty Mel bands you can just ignore it. Basically, the model needs to know that these frequencies are empty since it expects a fullband signal.

from nisqa.

yellowyi9527 commented on July 21, 2024

He's using Libosa to read the data, and it's going to automatically go to the specified frequency。in NISQA_lib.py 2283line （y, sr = lb.load(file_path, sr=sr)）@liushenme

from nisqa.

liushenme commented on July 21, 2024

Thanks for your reply. I understand that the author wants to resample all the speech samples to the same frequency, but after resample operation, speech samples may add many frequency dimensions with a value of 0. Is it reasonable?

And for NISQA Corpus, the speech samples are 48k sampling and their true frequency distribution is only 0-8k ( some is 0-4k ). They already have many frequency dimensions with a value of 0. I don‘t think it is reasonable to process these speech sample directly. @yellowyi9527

from nisqa.

Recommend Projects

Some questions about the sample rate of speech samples about nisqa HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent