Coder Social home page Coder Social logo

Comments (3)

gabrielmittag avatar gabrielmittag commented on July 21, 2024 1

Hi @liushenme, the speech samples do not need to be resampled since, as @yellowyi9527 pointed out, the length of the FFT window is adjusted according to the sampling frequency and then zero-padded to a fixed length of 4096 samples. This is done in line 2301: win_length = int(sr * win_length), where the initial win_length is 0.02 (20 ms). Because of this approach, resampling of the signals is not needed.

You are right that many of the Mel bands will be empty for a wideband or narrowband signal. However, the NISQA model predicts quality in a fullband context, so it will predict a lower score when a speech signal is missing high frequencies. For a wideband signal (up to 8 kHz), the score will on average only be slightly lower than a fullband signal. The score of a clean narrowband signal (up to 4 kHz) should on average be around a MOS of 3.8. The figure below shows the average predicted MOS vs the cutoff frequency of a lowpass.

If Librosa prints out a warning message about the empty Mel bands you can just ignore it. Basically, the model needs to know that these frequencies are empty since it expects a fullband signal.

mos_vs_lowpass

from nisqa.

yellowyi9527 avatar yellowyi9527 commented on July 21, 2024

He's using Libosa to read the data, and it's going to automatically go to the specified frequency。in NISQA_lib.py 2283line (y, sr = lb.load(file_path, sr=sr))@liushenme

from nisqa.

liushenme avatar liushenme commented on July 21, 2024

Thanks for your reply. I understand that the author wants to resample all the speech samples to the same frequency, but after resample operation, speech samples may add many frequency dimensions with a value of 0. Is it reasonable?

And for NISQA Corpus, the speech samples are 48k sampling and their true frequency distribution is only 0-8k ( some is 0-4k ). They already have many frequency dimensions with a value of 0. I don‘t think it is reasonable to process these speech sample directly. @yellowyi9527

from nisqa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.