Coder Social home page Coder Social logo

davyxx3 / audiosep Goto Github PK

View Code? Open in Web Editor NEW

This project forked from audio-agi/audiosep

0.0 0.0 0.0 16.52 MB

Official implementation of "Separate Anything You Describe"

Home Page: https://audio-agi.github.io/Separate-Anything-You-Describe/

License: MIT License

Python 99.23% Jupyter Notebook 0.77%

audiosep's Introduction

Separate Anything You Describe

arXiv GitHub Stars githubio Open In Colab Hugging Face Spaces Replicate

This repository contains the official implementation of "Separate Anything You Describe".

We introduce AudioSep, a foundation model for open-domain sound separation with natural language queries. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability on numerous tasks, such as audio event separation, musical instrument separation, and speech enhancement. Check out the separated audio examples on the Demo Page!


TODO

  • AudioSep training & fine-tuning code release.
  • AudioSep base model checkpoint release.
  • Evaluation benchmark release.

Setup

Clone the repository and setup the conda environment:

git clone https://github.com/Audio-AGI/AudioSep.git && \
cd AudioSep && \ 
conda env create -f environment.yml && \
conda activate AudioSep

Download model weights at checkpoint/.


Inference

from pipeline import build_audiosep, inference
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = build_audiosep(
      config_yaml='config/audiosep_base.yaml', 
      checkpoint_path='checkpoint/audiosep_base_4M_steps.ckpt', 
      device=device)

audio_file = 'path_to_audio_file'
text = 'textual_description'
output_file='separated_audio.wav'

# AudioSep processes the audio at 32 kHz sampling rate  
inference(model, audio_file, text, output_file, device)

To load directly from Hugging Face, you can do the following:

from models.audiosep import AudioSep
from utils import get_ss_model
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

ss_model = get_ss_model('config/audiosep_base.yaml')

model = AudioSep.from_pretrained("nielsr/audiosep-demo", ss_model=ss_model)

audio_file = 'path_to_audio_file'
text = 'textual_description'
output_file='separated_audio.wav'

# AudioSep processes the audio at 32 kHz sampling rate  
inference(model, audio_file, text, output_file, device)

Use chunk-based inference to save memory:

inference(model, audio_file, text, output_file, device, use_chunk=True)

Training

To utilize your audio-text paired dataset:

  1. Format your dataset to match our JSON structure. Refer to the provided template at datafiles/template.json.

  2. Update the config/audiosep_base.yaml file by listing your formatted JSON data files under datafiles. For example:

data:
    datafiles:
        - 'datafiles/your_datafile_1.json'
        - 'datafiles/your_datafile_2.json'
        ...

Train AudioSep from scratch:

python train.py --workspace workspace/AudioSep --config_yaml config/audiosep_base.yaml --resume_checkpoint_path checkpoint/ ''

Finetune AudioSep from pretrained checkpoint:

python train.py --workspace workspace/AudioSep --config_yaml config/audiosep_base.yaml --resume_checkpoint_path path_to_checkpoint

Benchmark Evaluation

Download the evaluation data under the evaluation/data folder. The data should be organized as:

evaluation:
    data:
        - audioset/
        - audiocaps/
        - vggsound/
        - music/
        - clotho/
        - esc50/

Run benchmark inference script, the results will be saved at eval_logs/

python benchmark.py --checkpoint_path audiosep_base_4M_steps.ckpt

"""
Evaluation Results:

VGGSound Avg SDRi: 9.144, SISDR: 9.043
MUSIC Avg SDRi: 10.508, SISDR: 9.425
ESC-50 Avg SDRi: 10.040, SISDR: 8.810
AudioSet Avg SDRi: 7.739, SISDR: 6.903
AudioCaps Avg SDRi: 8.220, SISDR: 7.189
Clotho Avg SDRi: 6.850, SISDR: 5.242
"""

Cite this work

If you found this tool useful, please consider citing

@article{liu2023separate,
  title={Separate Anything You Describe},
  author={Liu, Xubo and Kong, Qiuqiang and Zhao, Yan and Liu, Haohe and Yuan, Yi and Liu, Yuzhuo and Xia, Rui and Wang, Yuxuan and Plumbley, Mark D and Wang, Wenwu},
  journal={arXiv preprint arXiv:2308.05037},
  year={2023}
}
@inproceedings{liu22w_interspeech,
  title={Separate What You Describe: Language-Queried Audio Source Separation},
  author={Liu, Xubo and Liu, Haohe and Kong, Qiuqiang and Mei, Xinhao and Zhao, Jinzheng and Huang, Qiushi, and Plumbley, Mark D and Wang, Wenwu},
  year=2022,
  booktitle={Proc. Interspeech},
  pages={1801--1805},
}

audiosep's People

Contributors

badayvedat avatar chenxwh avatar eltociear avatar farookhnitap avatar liuxubo717 avatar nielsrogge avatar rs-labhub avatar shivam250702 avatar shresthasurav avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.