Coder Social home page Coder Social logo

iver56 / audiomentations Goto Github PK

View Code? Open in Web Editor NEW
1.7K 19.0 176.0 9.54 MB

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Home Page: https://iver56.github.io/audiomentations/

License: MIT License

Python 100.00%
audio sound data-augmentation augmentation sound-processing python machine-learning music deep-learning audio-effects

audiomentations's Introduction

Audiomentations

Build status Code coverage Code Style: Black Licence: MIT DOI

A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.

Need a Pytorch-specific alternative with GPU support? Check out torch-audiomentations!

Setup

Python version support PyPI version Number of downloads from PyPI per month

pip install audiomentations

Usage example

from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
import numpy as np

augment = Compose([
    AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
    TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
    PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
    Shift(p=0.5),
])

# Generate 2 seconds of dummy audio for the sake of example
samples = np.random.uniform(low=-0.2, high=0.2, size=(32000,)).astype(np.float32)

# Augment/transform/perturb the audio data
augmented_samples = augment(samples=samples, sample_rate=16000)

Documentation

The API documentation, along with guides, example code, illustrations and example sounds, is available at https://iver56.github.io/audiomentations/

Transforms

  • AddBackgroundNoise: Mixes in another sound to add background noise
  • AddColorNoise: Adds noise with specific color
  • AddGaussianNoise: Adds gaussian noise to the audio samples
  • AddGaussianSNR: Injects gaussian noise using a randomly chosen signal-to-noise ratio
  • AddShortNoises: Mixes in various short noise sounds
  • AdjustDuration: Trims or pads the audio to fit a target duration
  • AirAbsorption: Applies frequency-dependent attenuation simulating air absorption
  • Aliasing: Produces aliasing artifacts by downsampling without low-pass filtering and then upsampling
  • ApplyImpulseResponse: Convolves the audio with a randomly chosen impulse response
  • BandPassFilter: Applies band-pass filtering within randomized parameters
  • BandStopFilter: Applies band-stop (notch) filtering within randomized parameters
  • BitCrush: Applies bit reduction without dithering
  • Clip: Clips audio samples to specified minimum and maximum values
  • ClippingDistortion: Distorts the signal by clipping a random percentage of samples
  • Gain: Multiplies the audio by a random gain factor
  • GainTransition: Gradually changes the gain over a random time span
  • HighPassFilter: Applies high-pass filtering within randomized parameters
  • HighShelfFilter: Applies a high shelf filter with randomized parameters
  • Lambda: Applies a user-defined transform
  • Limiter: Applies dynamic range compression limiting the audio signal
  • LoudnessNormalization: Applies gain to match a target loudness
  • LowPassFilter: Applies low-pass filtering within randomized parameters
  • LowShelfFilter: Applies a low shelf filter with randomized parameters
  • Mp3Compression: Compresses the audio to lower the quality
  • Normalize: Applies gain so that the highest signal level becomes 0 dBFS
  • Padding: Replaces a random part of the beginning or end with padding
  • PeakingFilter: Applies a peaking filter with randomized parameters
  • PitchShift: Shifts the pitch up or down without changing the tempo
  • PolarityInversion: Flips the audio samples upside down, reversing their polarity
  • RepeatPart: Repeats a subsection of the audio a number of times
  • Resample: Resamples the signal to a randomly chosen sampling rate
  • Reverse: Reverses the audio along its time axis
  • RoomSimulator: Simulates the effect of a room on an audio source
  • SevenBandParametricEQ: Adjusts the volume of 7 frequency bands
  • Shift: Shifts the samples forwards or backwards
  • SpecChannelShuffle: Shuffles channels in the spectrogram
  • SpecFrequencyMask: Applies a frequency mask to the spectrogram
  • TanhDistortion: Applies tanh distortion to distort the signal
  • TimeMask: Makes a random part of the audio silent
  • TimeStretch: Changes the speed without changing the pitch
  • Trim: Trims leading and trailing silence from the audio

Changelog

[0.35.0] - 2024-03-15

Added

  • Add new transforms: AddColorNoise, Aliasing and BitCrush

For the full changelog, including older versions, see https://iver56.github.io/audiomentations/changelog/

Acknowledgements

Thanks to Nomono for backing audiomentations.

Thanks to all contributors who help improving audiomentations.

audiomentations's People

Contributors

alumae avatar askskro avatar atamazian avatar bakerbunker avatar cangonin avatar crlandsc avatar dependabot[bot] avatar enisberk avatar fmirus avatar iver56 avatar jeongyoonlee avatar juice500ml avatar karpnv avatar kvilouras avatar marvinlvn avatar mmxgn avatar omerferhatt avatar qwaker00 avatar solomidhero avatar thanatoz-1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

audiomentations's Issues

AddBackgroundNoise has no effect at all

I am trying to use AddBackgroundNoise without success. The result after applying the transform is identical to the original clean file.

This my code:

from audiomentations import Compose, AddBackgroundNoise
import librosa

# Define transforms
augmenter = Compose([
    AddBackgroundNoise(sounds_path='/path/to/noise/wav', p=1.)
])

# Load original (clean) audio
y, sr = librosa.load(/path/to/clean/file.wav,
                                sr=32000,
                                res_type="kaiser_fast",
                                mono=True)

# Apply augmentations
y_aug = augmenter(samples=y, sample_rate=sr)

The resulting y_aug is identical to the original y.

Any help is appreciated!
Thanks

Make transformations that apply to spectrograms, not waveforms

First, it is wise to design an API. Make a clear distinction between waveform transforms and spectrogram transforms.

  • Time mask a la spec augment
  • Frequency mask a la spec augment
  • Time warp a la spec augment
  • Pitch shift
  • Time stretch
  • Shift (with and without rollover)

How to deal with augmentations which exceed max/min amplitude?

Thanks for the great library! I noticed some of the transforms seem to expand the volume beyond +1 and -1; how do you suggest dealing with transforms that do this? If we scale the entire volume down to fit, then in some cases you could end up accidently making the mean volume too small to be audible.

A related issue is that data is in 16 bit signed integers (using PyDub library) so for this library is it correct I need to convert to 32 bit floats in the range -1 to +1? In going back and forth between these I am using the following functions, but I'd be grateful for any advice if this is incorrect:

def int16_samples_to_float32(y):
    """Convert int16 numpy array of audio samples to float32."""
    if y.dtype != np.int16:
        raise ValueError('input samples not int16')
    return y.astype(np.float32) / np.iinfo(np.int16).max

def float_samples_to_int16(y):
    """Convert floating-point numpy array of audio samples to int16."""
    if not issubclass(y.dtype.type, np.floating):
        raise ValueError('input samples not floating-point')
    try:
        result = (y * np.iinfo(np.int16).max).astype(np.int16)
    except Warning:
        print(y, np.iinfo(np.int16).max) # sometimes catch warnings that limit has been exceeded
        raise
    return result

Can this repo be used in multi thread?

I encountered this error, which caused by TimeMask

time mask: t0 138974  t: 21416  m.shpae: (21416,)  newshpae: (93184,)  rawshpae: (93184,)  slice shape: (0,)
err:  operands could not be broadcast together with shapes (0,) (21416,) (0,) 
augmenter = Compose([
        TimeStretch(min_rate=0.8, max_rate=1.25, p=1.0),
        AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.2),
        PitchShift(min_semitones=-4, max_semitones=4, p=0.2),
        Shift(min_fraction=-0.5, max_fraction=0.5, p=0.2),
        FrequencyMask(min_frequency_band=0.0, max_frequency_band=0.5, p=0.2),
        TimeMask(min_band_part=0.0, max_band_part=0.5, fade=False, p=0.2)
], p=1.0, shuffle=False)

def aug_sample(samples, sample_rate):
  assert samples.dtype == np.int16 
  samples =  samples * max(0.01, np.max(np.abs(samples))) / 32768.0
  samples = samples.astype(np.float32)
  samples = augmenter(samples=samples, sample_rate=sample_rate)
  samples *= 32767 / max(0.01, np.max(np.abs(samples)))
  samples = samples.astype(np.int16)
  return samples

using aug_sample to load wav by threading.Thread like mapreduce.

Can't open .wav file correctly

After run:
python -m demo.demo

There will be generate some wav files in output folder, but I can't open it with an error message:
Windows Media Player encountered a problem while playing the file.

Spectral version of transforms

Hi, is it possible to apply the transforms on spectral audio representations? Say I have mel spectrograms stored on disc and want to run on the fly augmentations. Augmentations directly in spectral domain would be useful since I could store the features instead of the audios and save disc space. Many of the augmentations probably require spectral domain internally anyway. Are there any plans to support this scenario?

Add Compose Transform

A Compose transform can compose multi BaseTransform, may be used add multi same transform and other.

PyTorch audiomentations?

Hey

I haven't read all the code but this project looks great !

How would you feel about participating to a similar project in PyTorch? This would be great !

Configurable cache size

Today, the LRU cache for loading sounds is hardcoded to max 64 sounds in AddShortNoises. Let's make it configurable, because the caching needs vary from case to case.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.