Coder Social home page Coder Social logo

iver56 / audiomentations Goto Github PK

View Code? Open in Web Editor NEW
1.8K 20.0 185.0 9.59 MB

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Home Page: https://iver56.github.io/audiomentations/

License: MIT License

Python 100.00%
audio sound data-augmentation augmentation sound-processing python machine-learning music deep-learning audio-effects

audiomentations's Issues

Spectral version of transforms

Hi, is it possible to apply the transforms on spectral audio representations? Say I have mel spectrograms stored on disc and want to run on the fly augmentations. Augmentations directly in spectral domain would be useful since I could store the features instead of the audios and save disc space. Many of the augmentations probably require spectral domain internally anyway. Are there any plans to support this scenario?

AddBackgroundNoise has no effect at all

I am trying to use AddBackgroundNoise without success. The result after applying the transform is identical to the original clean file.

This my code:

from audiomentations import Compose, AddBackgroundNoise
import librosa

# Define transforms
augmenter = Compose([
    AddBackgroundNoise(sounds_path='/path/to/noise/wav', p=1.)
])

# Load original (clean) audio
y, sr = librosa.load(/path/to/clean/file.wav,
                                sr=32000,
                                res_type="kaiser_fast",
                                mono=True)

# Apply augmentations
y_aug = augmenter(samples=y, sample_rate=sr)

The resulting y_aug is identical to the original y.

Any help is appreciated!
Thanks

Can this repo be used in multi thread?

I encountered this error, which caused by TimeMask

time mask: t0 138974  t: 21416  m.shpae: (21416,)  newshpae: (93184,)  rawshpae: (93184,)  slice shape: (0,)
err:  operands could not be broadcast together with shapes (0,) (21416,) (0,) 
augmenter = Compose([
        TimeStretch(min_rate=0.8, max_rate=1.25, p=1.0),
        AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.2),
        PitchShift(min_semitones=-4, max_semitones=4, p=0.2),
        Shift(min_fraction=-0.5, max_fraction=0.5, p=0.2),
        FrequencyMask(min_frequency_band=0.0, max_frequency_band=0.5, p=0.2),
        TimeMask(min_band_part=0.0, max_band_part=0.5, fade=False, p=0.2)
], p=1.0, shuffle=False)

def aug_sample(samples, sample_rate):
  assert samples.dtype == np.int16 
  samples =  samples * max(0.01, np.max(np.abs(samples))) / 32768.0
  samples = samples.astype(np.float32)
  samples = augmenter(samples=samples, sample_rate=sample_rate)
  samples *= 32767 / max(0.01, np.max(np.abs(samples)))
  samples = samples.astype(np.int16)
  return samples

using aug_sample to load wav by threading.Thread like mapreduce.

Can't open .wav file correctly

After run:
python -m demo.demo

There will be generate some wav files in output folder, but I can't open it with an error message:
Windows Media Player encountered a problem while playing the file.

Configurable cache size

Today, the LRU cache for loading sounds is hardcoded to max 64 sounds in AddShortNoises. Let's make it configurable, because the caching needs vary from case to case.

Make transformations that apply to spectrograms, not waveforms

First, it is wise to design an API. Make a clear distinction between waveform transforms and spectrogram transforms.

  • Time mask a la spec augment
  • Frequency mask a la spec augment
  • Time warp a la spec augment
  • Pitch shift
  • Time stretch
  • Shift (with and without rollover)

PyTorch audiomentations?

Hey

I haven't read all the code but this project looks great !

How would you feel about participating to a similar project in PyTorch? This would be great !

Add Compose Transform

A Compose transform can compose multi BaseTransform, may be used add multi same transform and other.

How to deal with augmentations which exceed max/min amplitude?

Thanks for the great library! I noticed some of the transforms seem to expand the volume beyond +1 and -1; how do you suggest dealing with transforms that do this? If we scale the entire volume down to fit, then in some cases you could end up accidently making the mean volume too small to be audible.

A related issue is that data is in 16 bit signed integers (using PyDub library) so for this library is it correct I need to convert to 32 bit floats in the range -1 to +1? In going back and forth between these I am using the following functions, but I'd be grateful for any advice if this is incorrect:

def int16_samples_to_float32(y):
    """Convert int16 numpy array of audio samples to float32."""
    if y.dtype != np.int16:
        raise ValueError('input samples not int16')
    return y.astype(np.float32) / np.iinfo(np.int16).max

def float_samples_to_int16(y):
    """Convert floating-point numpy array of audio samples to int16."""
    if not issubclass(y.dtype.type, np.floating):
        raise ValueError('input samples not floating-point')
    try:
        result = (y * np.iinfo(np.int16).max).astype(np.int16)
    except Warning:
        print(y, np.iinfo(np.int16).max) # sometimes catch warnings that limit has been exceeded
        raise
    return result

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.