yujia-yan / skipping-the-frame-level Goto Github PK

A simple yet effective Audio-to-Midi Automatic Piano Transcription system

License: MIT License

Python 100.00%

audio-to-midi automatic-transcription crf music piano-transcription pytorch sound-processing music-transcription audio midi

skipping-the-frame-level's People

Contributors

Stargazers

Watchers

Forkers

xk-wang taegyunkwon haveyouwantto seyong92 bisratgetnet hashbandicoot4 assassindesign superbrew-io

skipping-the-frame-level's Issues

Will it work on vocal transcription ?

I want to transcript the voice

Is there a live demonstration of the transcription?

Hi and thanks for releasing the code. The title speaks for itself. I'd like to know if there is a demonstration of a music being transcribed by that algorithm.

Sharing pretrained weights

Hi,
Thanks for sharing your work.

Can you share your trained model checkpoint? It seems that without training, I can't reproduce your results.

Note Timing Issue in Transcribed MIDI Files

Hi there. I'm reaching out to report an issue with the latest version of Transkun. Although I'm not an expert in machine learning, I've noticed that when using Transkun to transcribe audio into MIDI files, the notes appear too close together. This results in a short and abrupt sound, regardless of the input audio or MIDI synthesizer used.

As a user, I rely on Transkun to generate accurate and usable MIDI files, so I wanted to bring this problem to your attention. I greatly appreciate your efforts in developing Transkun, and I kindly request your assistance in resolving this note timing issue.

Please let me know if you need any further information from me to address this matter. Thank you for your attention, and I look forward to your prompt response.

About the model size

Hello, I have a question about the model size.
The pretrained model you gave is 40MB (https://github.com/Yujia-Yan/Skipping-The-Frame-Level/tree/main/transkun/pretrained)
But when I run the training script using this config, I got a checkpoint which is 170MB.

I have some problem when I use Skipping-The-Frame-Level

I get the following error on MacOS12.2.1 when installing transkun package:
mashengtailang@mashengtailangdeMacBook-Pro ~ % transkun input.mp3 output.mid
zsh: command not found: transkun

what is the version of python and requirements you use?

can you tell me the version?thanks a lot

can you upload the source code?

Running transkun, it only says "Killed"

Hi, I was looking for a software for transcription and I found your page in github. I've installed it and ran:

transkun mp3_file.mp3 midi_file.mid

but the only result is the text "Killed" in the screen. Any suggestion to find the issue?

Thank you!

Metadata conflit when installing transkun package

Hi,

I get the following error on W10 when installing transkun package:

Collecting transkun Downloading transkun-0.1.2a-py3-none-any.whl (36.7 MB) ---------------------------------------- 36.7/36.7 MB 40.9 MB/s eta 0:00:00 Discarding https://files.pythonhosted.org/packages/2a/af/db364b427a7a8acc2b7a51d0f1ef9a2ebff203add5d3a971472cde6897f7/transkun-0.1.2a-py3-none-any.whl (from https://pypi.org/simple/transkun/) (requires-python:>=3.6): Requested transkun from https://files.pythonhosted.org/packages/2a/af/db364b427a7a8acc2b7a51d0f1ef9a2ebff203add5d3a971472cde6897f7/transkun-0.1.2a-py3-none-any.whl has inconsistent version: filename has '0.1.2a0', but metadata has '0.1.2' ERROR: Could not find a version that satisfies the requirement transkun (from versions: 0.1.2a0) ERROR: No matching distribution found for transkun

Received this error message trying to run this script

I'm using MacOS Monterey with an M1 Chip.

[W NNPACK.cpp:53] Could not initialize NNPACK! Reason: Unsupported hardware.
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/bin/transkun", line 8, in
sys.exit(main())
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transkun/transcribe.py", line 57, in main
fs, audio= readAudio(audioPath)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transkun/transcribe.py", line 11, in readAudio
audio = pydub.AudioSegment.from_mp3(path)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pydub/audio_segment.py", line 796, in from_mp3
return cls.from_file(file, 'mp3', parameters=parameters)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pydub/audio_segment.py", line 651, in from_file
file, close_file = _fd_or_path_or_tempfile(file, 'rb', tempfile=False)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pydub/utils.py", line 60, in _fd_or_path_or_tempfile
fd = open(fd, mode=mode)
FileNotFoundError: [Errno 2] No such file or directory: 'input.mp3'

The facility and the time it takes to train the model?

Hello, thank you for opening source your marvelous work ~!
I am curious about the facility (how many GPUs) and the time it takes to train the model.

noise filter

is anyway to filter noise?
for example: some low probability events.

The default model parameter for training is different from the pretrained checkpoint

Hello, thank you for the valuable code sharing!

I have several questions about the code.

The default parameter for training is different from the pre-trained model in the repo.
For the default setting, it has 229 mel bins (as same as the paper), but the pre-trained model has 300 mel bins. Also, f_min and f_max value are different. Also I found that the pre-trained model has one more conv layer in the PreConvSpec. Does this change have a meaningful change on the performance?
Also, when I tried the training (once with the default parameter, and the other with the pre-trained model parameter), both cases shows much lower performance than the pre-trained model (0.7403 for valid F1) and the score reported in the paper. I think the only difference is the batch size, which is 12 in the paper and 2 in the default parameter. Have you ever trained the model with batch size 2 or trained the model with the default parameter in this repo?

Again, thank you very much for sharing your code! 😁

Reverberation simulated by midi notes

The outputted MIDI files often simulates reverb in the audio recording by replaying multiple notes rapidly.
Are the developers familiar with this?
Any fixes?

For runSeed

I encountered a problem while running the project. I am not sure how to set runSeed in the train function. I hope to receive help. Thank you!
def train(workerId, nWorker, filename, runSeed, args):

if num_processes == 1: train(0, 1, saved_filename)

Some params have grad=None during training

Hi,

Thank you very much for this repo - I'm trying to train this model from scratch on some Saxophone recordings.

Firstly, I was getting weird errors for

mono instead of stereo wav files
24 bit instead of 16 bit wav files
midi with overlapping notes at the same pitch

It might be worth mentioning these in the README for people who want to train on something other than Maestro.

The error I'm now encountering is during the first epoch

epoch:0 progress:0.000 step:0  loss:5907.2900 gradNorm:12.11 clipValue:28.85 time:0.39
epoch:0 progress:0.000 step:0  loss:5911.5234 gradNorm:12.17 clipValue:23.27 time:0.38
Warning: detected parameter with no gradient that requires gradient:
torch.Size([90, 256])
pitchEmbedding.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512, 1792])
velocityPredictor.0.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512])
velocityPredictor.0.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512, 512])
velocityPredictor.3.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512])
velocityPredictor.3.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([128, 512])
velocityPredictor.6.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([128])
velocityPredictor.6.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512, 1792])
refinedOFPredictor.0.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([512])
refinedOFPredictor.0.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([128, 512])
refinedOFPredictor.3.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([128])
refinedOFPredictor.3.bias
Warning: detected parameter with no gradient that requires gradient:
torch.Size([2, 128])
refinedOFPredictor.6.weight
Warning: detected parameter with no gradient that requires gradient:
torch.Size([2])
refinedOFPredictor.6.bias
Traceback (most recent call last):
  File "/import/linux/python/3.8.2/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/import/linux/python/3.8.2/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/import/research_c4dm/jxr01/Skipping-The-Frame-Level/transkun/train.py", line 364, in <module>
    train(0, 1, saved_filename, int(time.time()), args)
  File "/import/research_c4dm/jxr01/Skipping-The-Frame-Level/transkun/train.py", line 199, in train
    average_gradients(model, totalLen, parallel)
  File "/import/research_c4dm/jxr01/Skipping-The-Frame-Level/transkun/TrainUtil.py", line 45, in average_gradients
    param.grad.data /= c
AttributeError: 'NoneType' object has no attribute 'data'

It looks like many of the parameters don't have their gradients initialised. This is strange because at this point in the run it has completed a backward pass so I thought all the gradients should have been set. I'm using the following settings to train:

python3 -m transkun.train --nProcess 1 --batchSize 1 --hopSize 5 --chunkSize 10 --datasetPath "/import/research_c4dm/jxr01/bytedance_piano_transcription/filosax_train/" --datasetMetaFile_train "filosax_data/train.pickle" --datasetMetaFile_val "filosax_data/val.pickle" --augment checkpoint/filosax_model

Can you give me any tips on what to try next?