Coder Social home page Coder Social logo

di_nn's Introduction

Dual-input neural networks (DI-NNs)

This repository contains the code for the neural networks and training scripts related to the paper "Dual input neural networks for positional sound source localization" by Eric Grinstein, Vincent W. Neo, Patrick A. Naylor

Link: https://arxiv.org/abs/2308.04169

Overview

  • The di_nn repository contains the networks and training of our neural networks.
  • The file di_nn/utils/di_crnn.py contains a base Torch model that is unrelated to our application of sound source localization. It is therefore the recommended entry point for those who want to use DI-NNs on their own domains.
  • In turn, the file di_nn/di_ssl_net.py contains the network adapted for the task of sound source localization.
  • The file di_nn/trainer.py contains a Pytorch lightning model for training.
  • The pysoundloc submodule contains the Least Squares baseline.
  • The sydra submodule contains the code for generating the synthetic datasets used in the paper. It is included here for convenience, but is a separate project available here

Installation

Clone the repository using the following command: `git clone https://github.com/egrinstein/di_nn --recurse-submodules

The requirements of this project are listed in the file requirements.txt Use the command pip install -r requirements.txt to install them.

You can also train the model using the Kaggle notebook available here. Note you'll need a phone-verified Kaggle account to use the GPU.

Testing the model

Under the directory demo/, you will find a Jupyter notebook as well as the model's pretrained weights and a small testing dataset.

Generating the datasets

Synthetic data was generated using a package created by the authors called SYDRA (SYnthetic Datasets for Room Acoustics). This package is included here for convenience under the sydra directory. The configuration of each generated dataset is governed by Hydra.

Synthetic datasets

To generate a synthetic dataset, one must change the configuration under sydra/config/config.yaml to generate the desired synthetic dataset. Then, generate a dataset by running the command: python main.py dataset_dir=path/to/dataset num_samples=X. after modifying

Recorded datasets

To generate a dataset using the LibriAdhoc40 recorded dataset, you must first download it. Then, change directory to sydra/adhoc40_dataset and run the command python generate_dataset.py input_dir=/path/to/libri_adhoc40_dataset output_dir=/output/path mode='train|validation|test' to generate the training, validation or testing datasets. You can alternatively alter the configuration under sydra/config/adhoc40_dataset.yaml

Training

Training, the datasets and model are also configured using Hydra. You can alter these configs at di_nn/config. Once the datasets are available, you can train the models by running python train.py.

Evaluating the Least Squares Sound Source Localization (LS-SSL) baseline

The code for the baseline is located under the pysoundloc/ directory. To run the tests, run python test_ls_ssl_baseline.py. The choice of which dataset to evaluate the baseline on is govern by the same .yaml files above

di_nn's People

Contributors

egrinstein avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

zhouzhao01

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.