Coder Social home page Coder Social logo

torchlibrosa's Introduction

TorchLibrosa: PyTorch implementation of Librosa

This codebase provides PyTorch implementation of some librosa functions. If users previously used for training cpu-extracted features from librosa, but want to add GPU acceleration during training and evaluation, TorchLibrosa will provide almost identical features to standard torchlibrosa functions (numerical difference less than 1e-5).

Install

$ pip install torchlibrosa

Examples 1

Extract Log mel spectrogram with TorchLibrosa.

import torch
import torchlibrosa as tl

batch_size = 16
sample_rate = 22050
win_length = 2048
hop_length = 512
n_mels = 128

batch_audio = torch.empty(batch_size, sample_rate).uniform_(-1, 1)  # (batch_size, sample_rate)

# TorchLibrosa feature extractor the same as librosa.feature.melspectrogram()
feature_extractor = torch.nn.Sequential(
    tl.Spectrogram(
        hop_length=hop_length,
        win_length=win_length,
    ), tl.LogmelFilterBank(
        sr=sample_rate,
        n_mels=n_mels,
        is_log=False, # Default is true
    ))
batch_feature = feature_extractor(batch_audio) # (batch_size, 1, time_steps, mel_bins)

Examples 2

Extracting spectrogram, then log mel spectrogram, STFT and ISTFT with TorchLibrosa.

import torch
import torchlibrosa as tl

batch_size = 16
sample_rate = 22050
win_length = 2048
hop_length = 512
n_mels = 128

batch_audio = torch.empty(batch_size, sample_rate).uniform_(-1, 1)  # (batch_size, sample_rate)

# Spectrogram
spectrogram_extractor = tl.Spectrogram(n_fft=win_length, hop_length=hop_length)
sp = spectrogram_extractor.forward(batch_audio)   # (batch_size, 1, time_steps, freq_bins)

# Log mel spectrogram
logmel_extractor = tl.LogmelFilterBank(sr=sample_rate, n_fft=win_length, n_mels=n_mels)
logmel = logmel_extractor.forward(sp)   # (batch_size, 1, time_steps, mel_bins)

# STFT
stft_extractor = tl.STFT(n_fft=win_length, hop_length=hop_length)
(real, imag) = stft_extractor.forward(batch_audio)
# real: (batch_size, 1, time_steps, freq_bins), imag: (batch_size, 1, time_steps, freq_bins) #

# ISTFT
istft_extractor = tl.ISTFT(n_fft=win_length, hop_length=hop_length)
y = istft_extractor.forward(real, imag, length=batch_audio.shape[-1])    # (batch_size, samples_num)

Example 3

Check the compability of TorchLibrosa to Librosa. The numerical difference should be less than 1e-5.

python3 torchlibrosa/stft.py --device='cuda'    # --device='cpu' | 'cuda'

Contact

Qiuqiang Kong, [email protected]

Cite

[1] Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, and Mark D. Plumbley. "PANNs: Large-scale pretrained audio neural networks for audio pattern recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 2880-2894.

External links

Other related repos include:

torchaudio: https://github.com/pytorch/audio

Asteroid-filterbanks: https://github.com/asteroid-team/asteroid-filterbanks

Kapre: https://github.com/keunwoochoi/kapre

torchlibrosa's People

Contributors

azuwis avatar diggerdu avatar harukama avatar qiuqiangkong avatar richermans avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

torchlibrosa's Issues

Contribute to torchaudio

This seems an ambitious project. Why not consider adding the functions as contributions to torchaudio?

Some doubts about args

class ISTFT(DFTBase):
def init(self, n_fft=2048, hop_length=None, win_length=None,
window='hann', center=True, pad_mode='reflect', freeze_parameters=True,
onnx=False, frames_num=None, device=None):
"""PyTorch implementation of ISTFT with Conv1d. The function has the
same output as librosa.istft.

    Args:
        n_fft: int, fft window size, e.g., 2048
        hop_length: int, hop length samples, e.g., 441
        win_length: int, window length e.g., 2048
        window: str, window function name, e.g., 'hann'
        center: bool
        pad_mode: str, e.g., 'reflect'
        freeze_parameters: bool, set to True to freeze all parameters. Set
            to False to finetune all parameters.
        onnx: bool, set to True when exporting trained model to ONNX. This
            will replace several operations to operators supported by ONNX.
        frames_num: None | int, number of frames of audio clips to be 
            inferneced. Only useable when onnx=True.
        device: None | str, device of ONNX. Only useable when onnx=True.
    """

Hi, I want to ask some problems as follows:

  1. onnx arg. Can I set it to True when I'm training?
  2. frames_num arg. What does it mean? eg: samplerate=44100, segment=3s, n_fft=1024, so frames_num=int(44100*3/1024)=129?

I encountered some problems with stft when I was using nni prune and speedup, such as ola_ window. At the end of fine tuning, I want to save it as an onnx model, but an error was reported, and this error is related to the onnx parameter.

Torch / Torchlibrosa dependency issue

I have a container that contain the following packages:

python = "3.9.6"
librosa = "0.8.0"
torch = "1.9.0+cu111"
torchaudio = "0.9.0"
torchlibrosa = "0.0.9"

When I change the versions torch, torchaudio to:

python = "3.9.6"
librosa = "0.8.0"
torch = "1.10.0+cu111"
torchaudio = "0.10.0"
torchlibrosa = "0.0.9"

I receive the following error:

UNAVAILABLE: Internal: TypeError: pad_center() takes 1 positional argument but 2 were given
At:
/usr/local/lib/python3.8/dist-packages/torchlibrosa/stft.py(193): __init__
/usr/local/lib/python3.8/dist-packages/torchlibrosa/stft.py(645): __init__

It seems like torchlibrosa=0.0.9 has some problem with torch=1.10.0+cu111.
Another problem is that in according to the documentation torchlibrosa=0.0.9 (or torchlibrosa in general) is not depends on any torch version.
According to the setup.py file (https://github.com/qiuqiangkong/torchlibrosa/blob/master/setup.py) these are the required packages:

install_requires=[
        'numpy',
        'librosa>=0.9.0'
    ]

No torch version is required.
When I open new environment that only has torchlibrosa=0.0.9 installed in it and run the example from the REAME file:

import torch
import torchlibrosa as tl

batch_size = 16
sample_rate = 22050
win_length = 2048
hop_length = 512
n_mels = 128

batch_audio = torch.empty(batch_size, sample_rate).uniform_(-1, 1)  # (batch_size, sample_rate)

# TorchLibrosa feature extractor the same as librosa.feature.melspectrogram()
feature_extractor = torch.nn.Sequential(
    tl.Spectrogram(
        hop_length=hop_length,
        win_length=win_length,
    ), tl.LogmelFilterBank(
        sr=sample_rate,
        n_mels=n_mels,
        is_log=False, # Default is true
    ))
batch_feature = feature_extractor(batch_audio) # (batch_size, 1, time_steps, mel_bins)

The following error is raised:

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevconsole.py", line 364, in runcode
    coro = func()
  File "<input>", line 1, in <module>
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
ModuleNotFoundError: No module named 'torch'

To conclude, it seems like there two problems:

  1. The documentation of torchlibrosa doesn't mention torch as a "must" requirement even though it should be a must requirement
  2. There is a problem with torchlibrosa=0.0.9 that is working with torch=1.10.0+cu111

FutureWarning

Hi, I tried to run Example 1, but error happened as follow:

_FutureWarning: Pass size=2048 as keyword args. From version 0.10 passing these as positional arguments will result in an error

fft_window = librosa.util.pad_center(fft_window, n_fft)_

How can I solve it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.