Questions about sample rate and data normalization about s3prl HOT 4 CLOSED

s3prl commented on July 24, 2024

Questions about sample rate and data normalization

from s3prl.

Comments (4)

FastGeCo commented on July 24, 2024

Hi, thanks for your amazing open source project! How can I use different sample rate (e.g. 44.1kHz), espectially for the online config? It seems that the current version is less flexible for the On-the-fly settings. In addition, is there any data normalization in the preprocessing stage?
Thanks a lot.

Sometimes, the audio within a dataset is not sampled at the same rate, so that I need to resample the audio, which is not fit for the On-the-fly currently.

from s3prl.

leo19941227 commented on July 24, 2024

Hi, we are glad to help!
May I first ask if you are pretraining LibriSpeech? Since currently the online preprocessor on master only supports LibriSpeech, and that is why we fix sample_rate to 16000 Hz (all flac in LibriSpeech are in 16000 Hz).
If you wish to pretrain with other datasets, you might consider checking out to extension branch, which now support much more features for online extraction, including resampling and data normalization (eg. normalization in decibel or across time dimension aka cmvn) or even adding noise on input waveform (to pretrain a denoising Speech BERT).
Please let me know your situation and I can provide more details to help. eg. how to use extension branch (since it is still under quick development.)

from s3prl.

FastGeCo commented on July 24, 2024

Hi! Thanks for your reply. I am currently carrying out experiments on AudioSet, where the audio recordings are sampled not in 16000 Hz. How can I use the resampling for online extraction? Best Regardes, Helin Wang

…

------------------ 原始邮件 ------------------ 发件人: "andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning" <[email protected]>; 发送时间: 2020年10月10日(星期六) 凌晨1:32 收件人: "andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning"<Self-Supervised-Speech-Pretraining-and-Representation-Learning@noreply.github.com>; 抄送: "R.Westbrook"<[email protected]>;"Author"<[email protected]>; 主题: Re: [andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning] Questions about sample rate and data normalization (#38) Hi, we are glad to help! May I first ask if you are pretraining LibriSpeech? Since currently the online preprocessor on master only supports LibriSpeech, and that is why we fix sample_rate to 16000 Hz (all flac in LibriSpeech are in 16000 Hz). If you wish to pretrain with other datasets, you might consider checking out to extension branch, which now support much more features for online extraction, including resampling and data normalization (eg. normalization in decibel or across time dimension aka cmvn) or even adding noise on input waveform (to pretrain a denoising Speech BERT). Please let me know your situation and I can provide more details to help. eg. how to use extension branch (since it is still under quick development.) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

from s3prl.

andi611 commented on July 24, 2024

Hi! Thanks for your reply. I am currently carrying out experiments on AudioSet, where the audio recordings are sampled not in 16000 Hz. How can I use the resampling for online extraction? Best Regardes, Helin Wang

Hi @wanghl15,
In config/online.yaml, simply change sample_rate: 16000 to the sr of AudioSet.

@leo19941227 please verify if this is correct.

from s3prl.

Questions about sample rate and data normalization about s3prl HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent