qiuqiangkong / torchlibrosa Goto Github PK

License: MIT License

Python 100.00%

torchlibrosa's Introduction

TorchLibrosa: PyTorch implementation of Librosa

This codebase provides PyTorch implementation of some librosa functions. If users previously used for training cpu-extracted features from librosa, but want to add GPU acceleration during training and evaluation, TorchLibrosa will provide almost identical features to standard torchlibrosa functions (numerical difference less than 1e-5).

Install

$ pip install torchlibrosa

Examples 1

Extract Log mel spectrogram with TorchLibrosa.

import torch
import torchlibrosa as tl

batch_size = 16
sample_rate = 22050
win_length = 2048
hop_length = 512
n_mels = 128

batch_audio = torch.empty(batch_size, sample_rate).uniform_(-1, 1)  # (batch_size, sample_rate)

# TorchLibrosa feature extractor the same as librosa.feature.melspectrogram()
feature_extractor = torch.nn.Sequential(
    tl.Spectrogram(
        hop_length=hop_length,
        win_length=win_length,
    ), tl.LogmelFilterBank(
        sr=sample_rate,
        n_mels=n_mels,
        is_log=False, # Default is true
    ))
batch_feature = feature_extractor(batch_audio) # (batch_size, 1, time_steps, mel_bins)

Examples 2

Extracting spectrogram, then log mel spectrogram, STFT and ISTFT with TorchLibrosa.

import torch
import torchlibrosa as tl

batch_size = 16
sample_rate = 22050
win_length = 2048
hop_length = 512
n_mels = 128

batch_audio = torch.empty(batch_size, sample_rate).uniform_(-1, 1)  # (batch_size, sample_rate)

# Spectrogram
spectrogram_extractor = tl.Spectrogram(n_fft=win_length, hop_length=hop_length)
sp = spectrogram_extractor.forward(batch_audio)   # (batch_size, 1, time_steps, freq_bins)

# Log mel spectrogram
logmel_extractor = tl.LogmelFilterBank(sr=sample_rate, n_fft=win_length, n_mels=n_mels)
logmel = logmel_extractor.forward(sp)   # (batch_size, 1, time_steps, mel_bins)

# STFT
stft_extractor = tl.STFT(n_fft=win_length, hop_length=hop_length)
(real, imag) = stft_extractor.forward(batch_audio)
# real: (batch_size, 1, time_steps, freq_bins), imag: (batch_size, 1, time_steps, freq_bins) #

# ISTFT
istft_extractor = tl.ISTFT(n_fft=win_length, hop_length=hop_length)
y = istft_extractor.forward(real, imag, length=batch_audio.shape[-1])    # (batch_size, samples_num)

Example 3

Check the compability of TorchLibrosa to Librosa. The numerical difference should be less than 1e-5.

python3 torchlibrosa/stft.py --device='cuda'    # --device='cpu' | 'cuda'

Contact

Qiuqiang Kong, [email protected]

Cite

[1] Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, and Mark D. Plumbley. "PANNs: Large-scale pretrained audio neural networks for audio pattern recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 2880-2894.

External links

Other related repos include:

torchaudio: https://github.com/pytorch/audio

Asteroid-filterbanks: https://github.com/asteroid-team/asteroid-filterbanks

Kapre: https://github.com/keunwoochoi/kapre

torchlibrosa's People

Contributors

Stargazers

Watchers

Forkers

wanghelin1997 ml-illustrated zhaoforever oshiroy dengbohhxx koukyo1994 stevenlol nestyme motus ankitshah009 jihwanparkpreprocessing richermans yinkalario jwang1993 satoshirobatofujimoto atgdms mohammadzp cchacons csgcmai iver56 wutut sophia1488 ryanychen azuwis zylove006 avishai111 lykmapipo road2018 amorjnyh trunggnsap jzbcoding bellsky huake-ezhou swampstream jeff-ab y-chan mikesol cyrusvahidi diggerdu baekms hwimfg derrwei riffallen ramppdev

torchlibrosa's Issues

Contribute to torchaudio

This seems an ambitious project. Why not consider adding the functions as contributions to torchaudio?

Some doubts about args

class ISTFT(DFTBase):
def init(self, n_fft=2048, hop_length=None, win_length=None,
window='hann', center=True, pad_mode='reflect', freeze_parameters=True,
onnx=False, frames_num=None, device=None):
"""PyTorch implementation of ISTFT with Conv1d. The function has the
same output as librosa.istft.

    Args:
        n_fft: int, fft window size, e.g., 2048
        hop_length: int, hop length samples, e.g., 441
        win_length: int, window length e.g., 2048
        window: str, window function name, e.g., 'hann'
        center: bool
        pad_mode: str, e.g., 'reflect'
        freeze_parameters: bool, set to True to freeze all parameters. Set
            to False to finetune all parameters.
        onnx: bool, set to True when exporting trained model to ONNX. This
            will replace several operations to operators supported by ONNX.
        frames_num: None | int, number of frames of audio clips to be 
            inferneced. Only useable when onnx=True.
        device: None | str, device of ONNX. Only useable when onnx=True.
    """

Hi, I want to ask some problems as follows:

onnx arg. Can I set it to True when I'm training?
frames_num arg. What does it mean? eg: samplerate=44100, segment=3s, n_fft=1024, so frames_num=int(44100*3/1024)=129?

I encountered some problems with stft when I was using nni prune and speedup, such as ola_ window. At the end of fine tuning, I want to save it as an onnx model, but an error was reported, and this error is related to the onnx parameter.

Torch / Torchlibrosa dependency issue

I have a container that contain the following packages:

python = "3.9.6"
librosa = "0.8.0"
torch = "1.9.0+cu111"
torchaudio = "0.9.0"
torchlibrosa = "0.0.9"

When I change the versions torch, torchaudio to:

python = "3.9.6"
librosa = "0.8.0"
torch = "1.10.0+cu111"
torchaudio = "0.10.0"
torchlibrosa = "0.0.9"

I receive the following error:

UNAVAILABLE: Internal: TypeError: pad_center() takes 1 positional argument but 2 were given
At:
/usr/local/lib/python3.8/dist-packages/torchlibrosa/stft.py(193): __init__
/usr/local/lib/python3.8/dist-packages/torchlibrosa/stft.py(645): __init__

It seems like torchlibrosa=0.0.9 has some problem with torch=1.10.0+cu111.
Another problem is that in according to the documentation torchlibrosa=0.0.9 (or torchlibrosa in general) is not depends on any torch version.
According to the setup.py file (https://github.com/qiuqiangkong/torchlibrosa/blob/master/setup.py) these are the required packages:

install_requires=[
        'numpy',
        'librosa>=0.9.0'
    ]

No torch version is required.
When I open new environment that only has torchlibrosa=0.0.9 installed in it and run the example from the REAME file:

import torch
import torchlibrosa as tl

batch_size = 16
sample_rate = 22050
win_length = 2048
hop_length = 512
n_mels = 128

batch_audio = torch.empty(batch_size, sample_rate).uniform_(-1, 1)  # (batch_size, sample_rate)

# TorchLibrosa feature extractor the same as librosa.feature.melspectrogram()
feature_extractor = torch.nn.Sequential(
    tl.Spectrogram(
        hop_length=hop_length,
        win_length=win_length,
    ), tl.LogmelFilterBank(
        sr=sample_rate,
        n_mels=n_mels,
        is_log=False, # Default is true
    ))
batch_feature = feature_extractor(batch_audio) # (batch_size, 1, time_steps, mel_bins)

The following error is raised:

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevconsole.py", line 364, in runcode
    coro = func()
  File "<input>", line 1, in <module>
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
ModuleNotFoundError: No module named 'torch'

To conclude, it seems like there two problems:

The documentation of torchlibrosa doesn't mention torch as a "must" requirement even though it should be a must requirement
There is a problem with torchlibrosa=0.0.9 that is working with torch=1.10.0+cu111

fft_window = librosa.util.pad_center(fft_window, n_fft)_

How can I solve it?