Coder Social home page Coder Social logo

yujia-yan / skipping-the-frame-level Goto Github PK

View Code? Open in Web Editor NEW
74.0 7.0 8.0 35.19 MB

A simple yet effective Audio-to-Midi Automatic Piano Transcription system

License: MIT License

Python 100.00%
audio-to-midi automatic-transcription crf music piano-transcription pytorch sound-processing music-transcription audio midi

skipping-the-frame-level's People

Contributors

yujia-yan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

skipping-the-frame-level's Issues

Sharing pretrained weights

Hi,
Thanks for sharing your work.

Can you share your trained model checkpoint? It seems that without training, I can't reproduce your results.

Note Timing Issue in Transcribed MIDI Files

Hi there. I'm reaching out to report an issue with the latest version of Transkun. Although I'm not an expert in machine learning, I've noticed that when using Transkun to transcribe audio into MIDI files, the notes appear too close together. This results in a short and abrupt sound, regardless of the input audio or MIDI synthesizer used.

As a user, I rely on Transkun to generate accurate and usable MIDI files, so I wanted to bring this problem to your attention. I greatly appreciate your efforts in developing Transkun, and I kindly request your assistance in resolving this note timing issue.

Please let me know if you need any further information from me to address this matter. Thank you for your attention, and I look forward to your prompt response.

Running transkun, it only says "Killed"

Hi, I was looking for a software for transcription and I found your page in github. I've installed it and ran:

transkun mp3_file.mp3 midi_file.mid

but the only result is the text "Killed" in the screen. Any suggestion to find the issue?

Thank you!

Metadata conflit when installing transkun package

Hi,

I get the following error on W10 when installing transkun package:

Collecting transkun Downloading transkun-0.1.2a-py3-none-any.whl (36.7 MB) ---------------------------------------- 36.7/36.7 MB 40.9 MB/s eta 0:00:00 Discarding https://files.pythonhosted.org/packages/2a/af/db364b427a7a8acc2b7a51d0f1ef9a2ebff203add5d3a971472cde6897f7/transkun-0.1.2a-py3-none-any.whl (from https://pypi.org/simple/transkun/) (requires-python:>=3.6): Requested transkun from https://files.pythonhosted.org/packages/2a/af/db364b427a7a8acc2b7a51d0f1ef9a2ebff203add5d3a971472cde6897f7/transkun-0.1.2a-py3-none-any.whl has inconsistent version: filename has '0.1.2a0', but metadata has '0.1.2' ERROR: Could not find a version that satisfies the requirement transkun (from versions: 0.1.2a0) ERROR: No matching distribution found for transkun

Received this error message trying to run this script

I'm using MacOS Monterey with an M1 Chip.

[W NNPACK.cpp:53] Could not initialize NNPACK! Reason: Unsupported hardware.
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/bin/transkun", line 8, in
sys.exit(main())
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transkun/transcribe.py", line 57, in main
fs, audio= readAudio(audioPath)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transkun/transcribe.py", line 11, in readAudio
audio = pydub.AudioSegment.from_mp3(path)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pydub/audio_segment.py", line 796, in from_mp3
return cls.from_file(file, 'mp3', parameters=parameters)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pydub/audio_segment.py", line 651, in from_file
file, close_file = _fd_or_path_or_tempfile(file, 'rb', tempfile=False)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pydub/utils.py", line 60, in _fd_or_path_or_tempfile
fd = open(fd, mode=mode)
FileNotFoundError: [Errno 2] No such file or directory: 'input.mp3'

noise filter

is anyway to filter noise?
for example: some low probability events.

The default model parameter for training is different from the pretrained checkpoint

Hello, thank you for the valuable code sharing!

I have several questions about the code.

  1. The default parameter for training is different from the pre-trained model in the repo.
    For the default setting, it has 229 mel bins (as same as the paper), but the pre-trained model has 300 mel bins. Also, f_min and f_max value are different. Also I found that the pre-trained model has one more conv layer in the PreConvSpec. Does this change have a meaningful change on the performance?

  2. Also, when I tried the training (once with the default parameter, and the other with the pre-trained model parameter), both cases shows much lower performance than the pre-trained model (0.7403 for valid F1) and the score reported in the paper. I think the only difference is the batch size, which is 12 in the paper and 2 in the default parameter. Have you ever trained the model with batch size 2 or trained the model with the default parameter in this repo?

image

Again, thank you very much for sharing your code! ๐Ÿ˜

Reverberation simulated by midi notes

The outputted MIDI files often simulates reverb in the audio recording by replaying multiple notes rapidly.
Are the developers familiar with this?
Any fixes?

For runSeed

I encountered a problem while running the project. I am not sure how to set runSeed in the train function. I hope to receive help. Thank you!
def train(workerId, nWorker, filename, runSeed, args):

if num_processes == 1: train(0, 1, saved_filename)

Some params have grad=None during training

Hi,

Thank you very much for this repo - I'm trying to train this model from scratch on some Saxophone recordings.

Firstly, I was getting weird errors for

  • mono instead of stereo wav files
  • 24 bit instead of 16 bit wav files
  • midi with overlapping notes at the same pitch

It might be worth mentioning these in the README for people who want to train on something other than Maestro.

The error I'm now encountering is during the first epoch

epoch:0 progress:0.000 step:0  loss:5907.2900 gradNorm:12.11 clipValue:28.85 time:0.39
epoch:0 progress:0.000 step:0  loss:5911.5234 gradNorm:12.17 clipValue:23.27 time:0.38
Warning: detected parameter with no gradient that requires gradient:
torch.Size([90, 256])
pitchEmbedding.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512, 1792])
velocityPredictor.0.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512])
velocityPredictor.0.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512, 512])
velocityPredictor.3.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512])
velocityPredictor.3.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([128, 512])
velocityPredictor.6.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([128])
velocityPredictor.6.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512, 1792])
refinedOFPredictor.0.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512])
refinedOFPredictor.0.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([128, 512])
refinedOFPredictor.3.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([128])
refinedOFPredictor.3.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([2, 128])
refinedOFPredictor.6.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([2])
refinedOFPredictor.6.bias
Traceback (most recent call last):
  File "/import/linux/python/3.8.2/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/import/linux/python/3.8.2/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/import/research_c4dm/jxr01/Skipping-The-Frame-Level/transkun/train.py", line 364, in <module>
    train(0, 1, saved_filename, int(time.time()), args)
  File "/import/research_c4dm/jxr01/Skipping-The-Frame-Level/transkun/train.py", line 199, in train
    average_gradients(model, totalLen, parallel)
  File "/import/research_c4dm/jxr01/Skipping-The-Frame-Level/transkun/TrainUtil.py", line 45, in average_gradients
    param.grad.data /= c
AttributeError: 'NoneType' object has no attribute 'data'

It looks like many of the parameters don't have their gradients initialised. This is strange because at this point in the run it has completed a backward pass so I thought all the gradients should have been set. I'm using the following settings to train:

python3 -m transkun.train --nProcess 1 --batchSize 1 --hopSize 5 --chunkSize 10 --datasetPath "/import/research_c4dm/jxr01/bytedance_piano_transcription/filosax_train/" --datasetMetaFile_train "filosax_data/train.pickle" --datasetMetaFile_val "filosax_data/val.pickle" --augment checkpoint/filosax_model

Can you give me any tips on what to try next?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.