Coder Social home page Coder Social logo

zabir-nabil / torch-speech-dataloader Goto Github PK

View Code? Open in Web Editor NEW
8.0 3.0 0.0 439 KB

A ready-to-use pytorch dataloader for audio classification, speech classification, speaker recognition, etc. with in-GPU augmentations

License: MIT License

Python 100.00%
audio-augmentation speech torch gpu-augmentation pytorch-speech-dataloader speech-augmentation-gpu torch-dataloader

torch-speech-dataloader's Introduction

torch-speech-dataloader

A ready-to-use pytorch dataloader for audio classification, speech classification, speaker recognition, etc. with in-GPU augmentations.

  • PyTorch speech dataloader with 5 (or less) lines of code. get_torch_speech_dataloader_from_config(config)
  • Batch augmentation in GPU, powered by torch-audiomentations
  • RIRs augmentation with any set of IR file(s) [cpu]
  • MUSAN-like augmentation with any set of source files. Customizable. [cpu]
  • Written in one night, may contain bugs!

Install

pip install -U git+https://github.com/zabir-nabil/torch-speech-dataloader.git@main

Use

from torch_speech_dataloader import get_torch_speech_dataloader, get_torch_speech_dataloader_from_config
from torch_speech_dataloader.augmentation_utils import placeholder_gpu_augmentation

config_1 = {
    "filenames" : ["../test.wav"] * 5 + ["../test_hindi.wav"] * 5,
    "speech_labels" : ["test"] * 5 + ["test2"] * 5,
    "batch_size" : 3,
    "num_workers" : 5,
    "device" : torch.device('cuda:1'),
    "sanity_check_path" : "../sanity_test",
    "sanity_check_samples" : 2,
    "batch_audio_augmentation": placeholder_gpu_augmentation,
    "rirs_reverb" : {"apply": True},
    "musan_augmentation" : {"apply": True, "mix_multiples_max_count": -1, "musan_max_len": 1.},
    "verbose" : 0
}

dummy_tsdl = get_torch_speech_dataloader_from_config(config_1)
for d, l in dummy_tsdl.get_batch():
    print(d.shape)
    print(l)

Others

config parameters

  • filenames: A list of filepaths for the audio / speech files (usually wav).
  • speech_labels: Corresponding labels for filenames / list of audio files.
  • batch_size: Batch size of the dataloader.
  • num_workers: Dataloader workers.
  • device: torch device [default: cpu].
  • sanity_check_path: If you want to look at the sample audio files generated, specify a path where the sample augmented audio files will be saved.
  • sanity_check_samples: Number of sample audio files to store in the sanity check folder.
  • batch_audio_augmentation: Usually, it will run on the GPU batch if gpu device is specified, else on the CPU batch. Any transform (compose) / augmentation, that takes a tensor of dimension [B x C x N].
  • rirs_reverb:
    • apply: If apply is true, only then this augmentation will be applied to each audio individually.
    • reverb_source_files_path: A list of IR filepaths.
  • musan_augmentation:
    • apply: If apply is true, only then this augmentation will be applied to each audio individually.
    • musan_config: { "music": ([list of music file paths], range_for_num_music_files_to_use, range_for_noise_snr), "speech": ([list of speech file paths], range_for_num_speech_files_to_use, range_for_noise_snr), } [example: augmentation_utils.placeholder_musan_config]
    • mix_multiples_max_count: Multiple noise types should be mixed (music + noise + ...). Number of noise types that should be mixed at most.
    • musan_max_len: <= 0: take the musan noise and crop it with equal length (same as input audio); > 0: maximum length of the cropped musan noise (in secs.).
  • audio_augmentation: List of funcs that can be applied to a single audio with shape [N,].
  • features: Feature extraction. [N,] -> [T,F].
  • feature_augmentation: List of funcs that can be applied to a single feature with shape [T,F].

torch-speech-dataloader's People

Contributors

zabir-nabil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.