The torch-speech-dataloader from zabir-nabil

torch-speech-dataloader

A ready-to-use pytorch dataloader for audio classification, speech classification, speaker recognition, etc. with in-GPU augmentations.

PyTorch speech dataloader with 5 (or less) lines of code. get_torch_speech_dataloader_from_config(config)
Batch augmentation in GPU, powered by torch-audiomentations
RIRs augmentation with any set of IR file(s) [cpu]
MUSAN-like augmentation with any set of source files. Customizable. [cpu]
Written in one night, may contain bugs!

Install

pip install -U git+https://github.com/zabir-nabil/torch-speech-dataloader.git@main

Use

from torch_speech_dataloader import get_torch_speech_dataloader, get_torch_speech_dataloader_from_config
from torch_speech_dataloader.augmentation_utils import placeholder_gpu_augmentation

config_1 = {
    "filenames" : ["../test.wav"] * 5 + ["../test_hindi.wav"] * 5,
    "speech_labels" : ["test"] * 5 + ["test2"] * 5,
    "batch_size" : 3,
    "num_workers" : 5,
    "device" : torch.device('cuda:1'),
    "sanity_check_path" : "../sanity_test",
    "sanity_check_samples" : 2,
    "batch_audio_augmentation": placeholder_gpu_augmentation,
    "rirs_reverb" : {"apply": True},
    "musan_augmentation" : {"apply": True, "mix_multiples_max_count": -1, "musan_max_len": 1.},
    "verbose" : 0
}

dummy_tsdl = get_torch_speech_dataloader_from_config(config_1)
for d, l in dummy_tsdl.get_batch():
    print(d.shape)
    print(l)

Others

`config` parameters

filenames: A list of filepaths for the audio / speech files (usually wav).
speech_labels: Corresponding labels for filenames / list of audio files.
batch_size: Batch size of the dataloader.
num_workers: Dataloader workers.
device: torch device [default: cpu].
sanity_check_path: If you want to look at the sample audio files generated, specify a path where the sample augmented audio files will be saved.
sanity_check_samples: Number of sample audio files to store in the sanity check folder.
batch_audio_augmentation: Usually, it will run on the GPU batch if gpu device is specified, else on the CPU batch. Any transform (compose) / augmentation, that takes a tensor of dimension [B x C x N].
rirs_reverb:
- apply: If apply is true, only then this augmentation will be applied to each audio individually.
- reverb_source_files_path: A list of IR filepaths.
musan_augmentation:
- apply: If apply is true, only then this augmentation will be applied to each audio individually.
- musan_config: { "music": ([list of music file paths], range_for_num_music_files_to_use, range_for_noise_snr), "speech": ([list of speech file paths], range_for_num_speech_files_to_use, range_for_noise_snr), } [example: augmentation_utils.placeholder_musan_config]
- mix_multiples_max_count: Multiple noise types should be mixed (music + noise + ...). Number of noise types that should be mixed at most.
- musan_max_len: <= 0: take the musan noise and crop it with equal length (same as input audio); > 0: maximum length of the cropped musan noise (in secs.).
audio_augmentation: List of funcs that can be applied to a single audio with shape [N,].
features: Feature extraction. [N,] -> [T,F].
feature_augmentation: List of funcs that can be applied to a single feature with shape [T,F].

zabir-nabil / torch-speech-dataloader Goto Github PK

torch-speech-dataloader's Introduction

torch-speech-dataloader

Install

Use

Others

`config` parameters

torch-speech-dataloader's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

zabir-nabil / torch-speech-dataloader Goto Github PK

torch-speech-dataloader's Introduction

torch-speech-dataloader

Install

Use

Others

config parameters

torch-speech-dataloader's People

Contributors

Stargazers

Watchers

Recommend Projects

Recommend Topics

Recommend Org

`config` parameters