The audiomentations's discuss from iver56

Add shuffle parameter in Composer

To let the transformations happen in random order

Document all available transforms

Add band-pass filter transformation

Spectral version of transforms

Hi, is it possible to apply the transforms on spectral audio representations? Say I have mel spectrograms stored on disc and want to run on the fly augmentations. Augmentations directly in spectral domain would be useful since I could store the features instead of the audios and save disc space. Many of the augmentations probably require spectral domain internally anyway. Are there any plans to support this scenario?

Add clipping transform that can clip values outside an absolute amplitude (not percentile)

E.g. to clip all values that are outside the range [-1.0, 1.0].

This relates to #25

Warn when resampling in load_sound_file

Because this hurts execution time significantly

Implement SpecCompose

Add distortion transformation

Add equalizer transformation

Add high pass filter transformation

Add a wow and flutter transform

Inspired by https://arxiv.org/ftp/arxiv/papers/1912/1912.05472.pdf

support add random multi short noise with one wav

Add support for more time stretching methods

https://github.com/KAIST-MACLab/PyTSMod

AddBackgroundNoise has no effect at all

I am trying to use AddBackgroundNoise without success. The result after applying the transform is identical to the original clean file.

This my code:

from audiomentations import Compose, AddBackgroundNoise
import librosa

# Define transforms
augmenter = Compose([
    AddBackgroundNoise(sounds_path='/path/to/noise/wav', p=1.)
])

# Load original (clean) audio
y, sr = librosa.load(/path/to/clean/file.wav,
                                sr=32000,
                                res_type="kaiser_fast",
                                mono=True)

# Apply augmentations
y_aug = augmenter(samples=y, sample_rate=sr)

The resulting y_aug is identical to the original y.

Any help is appreciated!
Thanks

Can this repo be used in multi thread?

I encountered this error, which caused by TimeMask

time mask: t0 138974  t: 21416  m.shpae: (21416,)  newshpae: (93184,)  rawshpae: (93184,)  slice shape: (0,)
err:  operands could not be broadcast together with shapes (0,) (21416,) (0,)

augmenter = Compose([
        TimeStretch(min_rate=0.8, max_rate=1.25, p=1.0),
        AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.2),
        PitchShift(min_semitones=-4, max_semitones=4, p=0.2),
        Shift(min_fraction=-0.5, max_fraction=0.5, p=0.2),
        FrequencyMask(min_frequency_band=0.0, max_frequency_band=0.5, p=0.2),
        TimeMask(min_band_part=0.0, max_band_part=0.5, fade=False, p=0.2)
], p=1.0, shuffle=False)

def aug_sample(samples, sample_rate):
  assert samples.dtype == np.int16 
  samples =  samples * max(0.01, np.max(np.abs(samples))) / 32768.0
  samples = samples.astype(np.float32)
  samples = augmenter(samples=samples, sample_rate=sample_rate)
  samples *= 32767 / max(0.01, np.max(np.abs(samples)))
  samples = samples.astype(np.int16)
  return samples

using aug_sample to load wav by threading.Thread like mapreduce.

Cache librosa.load in AddImpulseResponse.__apply_ir

functools.lru_cache would be apt here

Add gain transform

For lowering the volume

Don't let TimeStretch change the length of the data array

Add background noise transformation

Environmental noise, audio to be specified by the client

need accelerate transform speed, since read and process multi small file

Can't open .wav file correctly

After run:
python -m demo.demo

There will be generate some wav files in output folder, but I can't open it with an error message:
Windows Media Player encountered a problem while playing the file.

Configurable cache size

Today, the LRU cache for loading sounds is hardcoded to max 64 sounds in AddShortNoises. Let's make it configurable, because the caching needs vary from case to case.

Gain, PolarityInversion cannot be used

installing from source solved this issue.

Add reverb transformation

Upload to PyPI

Make SNR unit consistent across transforms

dB vs not dB. Perhaps make a class that represents SNR?

https://stackoverflow.com/questions/21708753/python-convention-for-variable-naming-to-indicate-units

Divide by zero problem in AddBackgroundNoise

If the noise is empty sound, divide zero problems occur

audiomentations/audiomentations/core/audio_loading_utils.py

Line 77 in 3cf5f92

    
           def load_wav_file(file_path, sample_rate, mono=True, resample_type="kaiser_best"):

Make transformations that apply to spectrograms, not waveforms

First, it is wise to design an API. Make a clear distinction between waveform transforms and spectrogram transforms.

Time mask a la spec augment
Frequency mask a la spec augment
Time warp a la spec augment
Pitch shift
Time stretch
Shift (with and without rollover)

Add support for using FFT to speed up convolution

https://docs.scipy.org/doc/numpy/reference/generated/numpy.convolve.html

vs.

https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.convolve.html

Add tanh distortion transform

Add support for multichannel audio data

shape=(num_samples, num_channels) should be supported as well as shape=(num_samples,)

Rename AddImpulseResponse to ApplyImpulseResponse?

Any thoughts on this?

Should be done in a backwards-compatible manner of course.

Fix bug: FrequencyMask sometimes outputs NaN values

The bug can be reproduced with f.ex. these parameters:

should_apply=True,
bandwidth=600,
freq_start=172,

Disregard non-audio files like "license.txt" when gathering the list of impulse response files

Add dynamic range compression transformation

Add low pass filter transformation

Implement a transform that varies gain over time

PyTorch audiomentations?

Hey

I haven't read all the code but this project looks great !

How would you feel about participating to a similar project in PyTorch? This would be great !

Add downsampling transformation

Add Compose Transform

A Compose transform can compose multi BaseTransform, may be used add multi same transform and other.

How to deal with augmentations which exceed max/min amplitude?

Thanks for the great library! I noticed some of the transforms seem to expand the volume beyond +1 and -1; how do you suggest dealing with transforms that do this? If we scale the entire volume down to fit, then in some cases you could end up accidently making the mean volume too small to be audible.

A related issue is that data is in 16 bit signed integers (using PyDub library) so for this library is it correct I need to convert to 32 bit floats in the range -1 to +1? In going back and forth between these I am using the following functions, but I'd be grateful for any advice if this is incorrect:

def int16_samples_to_float32(y):
    """Convert int16 numpy array of audio samples to float32."""
    if y.dtype != np.int16:
        raise ValueError('input samples not int16')
    return y.astype(np.float32) / np.iinfo(np.int16).max

def float_samples_to_int16(y):
    """Convert floating-point numpy array of audio samples to int16."""
    if not issubclass(y.dtype.type, np.floating):
        raise ValueError('input samples not floating-point')
    try:
        result = (y * np.iinfo(np.int16).max).astype(np.int16)
    except Warning:
        print(y, np.iinfo(np.int16).max) # sometimes catch warnings that limit has been exceeded
        raise
    return result

https://github.com/aleju/imgaug/blob/0101108d4fed06bc5056c4a03e2bcb0216dac326/imgaug/augmenters/meta.py#L3188

iver56 / audiomentations Goto Github PK

audiomentations's Issues

Recommend Projects

Recommend Topics

Recommend Org