Coder Social home page Coder Social logo

ddsp-piano's Introduction

DDSP-Piano: Differentiable Piano model for MIDI-to-Audio Performance Synthesis

| Audio Samples 🔈 | DAFx Conference Paper 📄 | JAES Article 📄 |

DDSP-Piano is a piano sound synthesizer from MIDI based on DDSP.

v2.0 Architecture

Installation

This code relies on the official Tensorflow implementation of DDSP (tested on v3.2.0 and v3.7.0) without additional package required.

pip install --upgrade ddsp==3.7.0

Audio Synthesis from MIDI

Single MIDI file Synthesis

A piano MIDI file can be synthesized using the command:

python synthesize_midi_file.py <input_midi_file.mid> <output_file.wav>

Additional arguments for the inference script include:

  • -c, --config: a .gin configuration file of a DDSP-Piano model architecture. You can chose one of the configs in the ddsp_piano/configs/ folder.
  • --ckpt: a checkpoint folder with your own model weights.
  • --piano_type: the desired model among the 10 piano years learned from the MAESTRO dataset (0 to 9).
  • -d, --duration: the maximum duration of the synthesized file. It is set by default to None, which will synthesize the whole file.
  • -wu, --warm_up: duration of recurrent layers warm-up (to avoid undesirable noise at the beginning of the synthesized audio).
  • -u, --unreverbed: toggle it to also get the dry piano sound, without reverb applying.
  • -n, --normalize: set the loudness of the output file to this amount of dBFS. Set by default to None, which does not apply any gain modification.

The default arguments will synthesize using the most recent version of DDSP-Piano. If you want to use the default model presented in the published papers, the inference script should look like:

python synthesize_midi_file.py \
    --config ddsp_piano/configs/dafx22.gin \
    --ckpt ddsp_piano/model_weights/dafx22/ \
    <input_midi_file.mid> <output_file.wav>

Synthesize multiple performances in batch

If you want to synthesize multiple performances from MAESTRO at once, you can gather their information into a .csv file (see assets/tracks_listening_test.csv for example) and use this script:

python synthesize_from_csv.py <path/to/maestro-v3.0.0/> <your/file.csv> <output/directory/>

It has the same additional arguments as the synthesize_midi_file.py script, with the exception of -dc replacing the -u flag in order to get the dry audio, but also the isolated filtered noise and additive synthesizers outputs.

Evaluation script

Evaluation of the model can be conducted on the full MAESTRO test set with the corresponding script:

python evaluate_model.py <path/to/maestro-v3.0.0/> <output-directory/>

Additional arguments include:

  • -c, --config: the .gin model config file.
  • --ckpt: checkpoint to load weights from.
  • -wu, --warm_up: the warm-up duration.
  • -w, --get_wav: if toggled, will also save the audio of all synthesis examples.

Model training

The paper model is trained and evaluated on the MAESTRO dataset (v3.0.0). After following the instructions for downloading it, a DDSP-Piano model can be trained using one of the scripts presented below.

Dataset preprocessing (optional)

The model uses a particular encoding for handling MIDI data. During training, conversion on the fly can take some time, on top of resampling audio data.

The following script can be used to preprocess the MIDI and audio data of MAESTRO, and store them in TFRecord format for faster data pipeline processing:

python preprocess_maestro.py <path/to/maestro-v3.0.0/> <store/tfrecords/in/this/folder/>

Additional arguments include:

  • -sr: the desired audio sampling rate, to adjust accordingly to the model configuration. Set by default to 24kHz.
  • -fr: the MIDI control frame rate, set by default to 250Hz.
  • -p: the polyphonic capacity of the model, or maximum number of simultaneous notes handlable.

Single Phase Training

According to our conducted listening test, decent synthesis quality can be achieved with only a single training phase, using the following python script:

python train_single_phase.py <path/to/maestro-v3.0.0/> <experiment-directory/>

Additional arguments include:

  • -c, --config: a .gin model configuration file.
  • --val_path: optional path to the .tfrecord file containing the preprocessed validation data (see above).
  • --batch_size, --steps_per_epoch, --epochs, --lr: your usual training hyper-parameters.
  • -p, --phase: the current training phase (which toggles the trainability of the corresponding layers).
  • -r, --restore: a checkpoint folder to restore weights from.

Note that the path/to/maestrov3.0.0/ can either be the extracted Maestro dataset as is, or the maestro_training.tfrecord preprocessed version obtained from the previous section.

During training, the Tensorboard logs are saved under <experiment-directory>/logs/.‡

Full Training Procedure (legacy)

This script reproduces the full training of the default model presented in the papers:

source train_ddsp_piano.sh <path-to-maestro-v3.0.0/> <experiment-directory/>

It alternates between 2 training phases (one for the layers involved in the partial frequencies computation and the other for the remaining layers). The final model checkpoint should be located in <experiment-directory>/phase_3/last_iter/.

However, as frequency estimation with differentiable oscillators is still an unsolved issue (see here and here), the second training phase does not improve the model quality and we recommend to just use the single training phase script above for simplicity.

TODO

  • Format code for FDN-based reverb.
  • Use filtered noise synth with dynamic size on all model configs + adapt all model buildings.
  • Release script for extracting single note partials estimation.
  • Remove training phase related code.

Bibtex

If you use this code for your research, please cite it as:

@article{renault2023ddsp_piano,
  title={DDSP-Piano: A Neural Sound Synthesizer Informed by Instrument Knowledge},
  author={Renault, Lenny and Mignot, Rémi and Roebel, Axel},
  journal={Journal of the Audio Engineering Society},
  volume={71},
  number={9},
  pages={552--565}
  year={2023},
  month={September}
}

or

@inproceedings{renault2022diffpiano,
  title={Differentiable Piano Model for MIDI-to-Audio Performance Synthesis},
  author={Renault, Lenny and Mignot, Rémi and Roebel, Axel},
  booktitle={Proceedings of the 25th International Conference on Digital Audio Effects},
  year={2022}
}

Acknowledgments

This project is conducted at IRCAM and has been funded by the European Project AI4Media (grant number 951911).

Thanks to @phvial for its implementation of the FDN-based reverb, in the context of the AQUA-RIUS ANR project.

         

ddsp-piano's People

Contributors

lrenault avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ddsp-piano's Issues

ModuleNotFoundError: No module named 'priv_ddfx'

Thank you for your research and sharing. I tried with the main repo and there was no problem, but when I tried with the dev repo I got the error: ModuleNotFoundError: No module named 'priv_ddfx'
I think your repo is missing module priv_ddfx.py?

Help please. ValueError: operands could not be broadcast together with remapped shapes

Thanks for sharing your codes and models! I tried to run 'synthesize_midi_file.py' as you said, but encountered the below error:

(ddsp_test) F:\code\ddsp-piano-dev>python synthesize_midi_file.py "test01.mid" "test01.wav"
Loading midi file...
Traceback (most recent call last):
File "synthesize_midi_file.py", line 63, in
main(process_args())
File "synthesize_midi_file.py", line 32, in main
inputs = load_midi_as_conditioning(args.midi_file, duration=args.duration)
File "F:\code\ddsp-piano-dev\ddsp_piano\utils\io_utils.py", line 102, in load_midi_as_conditioning
conditioning = ensure_sequence_length(conditioning, target_n_frames)
File "F:\code\ddsp-piano-dev\ddsp_piano\utils\io_utils.py", line 61, in ensure_sequence_length
return np.pad(sequence, pad_width=pad_width)
File "<array_function internals>", line 6, in pad
File "F:\Envs\ddsp_test\lib\site-packages\numpy\lib\arraypad.py", line 743, in pad
pad_width = _as_pairs(pad_width, array.ndim, as_index=True)
File "F:\Envs\ddsp_test\lib\site-packages\numpy\lib\arraypad.py", line 518, in _as_pairs
return np.broadcast_to(x, (ndim, 2)).tolist()
File "<array_function internals>", line 6, in broadcast_to
File "F:\Envs\ddsp_test\lib\site-packages\numpy\lib\stride_tricks.py", line 411, in broadcast_to
return _broadcast_to(array, shape, subok=subok, readonly=True)
File "F:\Envs\ddsp_test\lib\site-packages\numpy\lib\stride_tricks.py", line 350, in _broadcast_to
op_flags=['readonly'], itershape=shape, order='C')
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (6,) and requested shape (3,2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.