bytedance / piano_transcription Goto Github PK

Python 98.53% Shell 1.47%

piano_transcription's Introduction

Piano transcription

Piano transcription is the task of transcribing piano recordings into MIDI files. This repo is the PyTorch implementation of our proposed high-resolution piano transcription system [1].

Demos

Here is a demo of our piano transcription system: https://www.youtube.com/watch?v=5U-WL0QvKCg

Demo and Docker image on Replicate

Environments

This codebase is developed with Python 3.7 and PyTorch 1.4.0 (Should work with other versions, but not fully tested).

Install dependencies:

pip install -r requirements.txt

Piano transcription using pretrained model

The easiest way is to transcribe a new piano recording is to install the piano_transcription_inference package: https://github.com/qiuqiangkong/piano_transcription_inference with pip as follows:

pip install piano_transcription_inference

Then, execute the following commands to transcribe this audio.

from piano_transcription_inference import PianoTranscription, sample_rate, load_audio

# Load audio
(audio, _) = load_audio('resources/cut_liszt.mp3', sr=sample_rate, mono=True)

# Transcriptor
transcriptor = PianoTranscription(device='cuda')    # 'cuda' | 'cpu'

# Transcribe and write out to MIDI file
transcribed_dict = transcriptor.transcribe(audio, 'cut_liszt.mid')

Train a piano transcription system from scratch

This section provides instructions if users would like to train a piano transcription system from scratch.

0. Prepare data

We use MAESTRO dataset V2.0.0 [1] to train the piano transcription system. MAESTRO consists of over 200 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms. MAESTRO dataset can be downloaded from https://magenta.tensorflow.org/datasets/maestro.

Statistics of MAESTRO V2.0.0 [ref]:

Split	Performances	Duration (hours)	Size (GB)	Notes (millions)
Train	967	161.3	97.7	5.73
Validation	137	19.4	11.8	0.64
Test	178	20.5	12.4	0.76
Total	1282	201.2	121.8	7.13

After downloading, the dataset looks like:

dataset_root
├── 2004
│    └── (264 files)
├── 2006
│    └── (230 files)
├── 2008
│    └── (294 files)
├── 2009
│    └── (250 files) 
├── 2011
│    └── (326 files)
├── 2013
│    └── (254 files)
├── 2014
│    └── (210 files)
├── 2015
│    └── (258 files)
├── 2017
│    └── (280 files)
├── 2018
│    └── (198 files)
├── LICENSE
├── maestro-v2.0.0.csv
├── maestro-v2.0.0.json
└── README

1. Train

Execute the commands line by line in runme.sh, including:

Config dataset path and your workspace.
Pack audio recordings to hdf5 files.
Train piano note transcription system.
Train piano pedal transcription system.
Combine piano note and piano pedal transcription systems.
Evaluate.

All training steps are described in runme.sh. It worth looking into runme.sh to see how the piano transcription system is trained. In total 29 GB GPU memoroy is required with a batch size of 12. Users may consider to reduce the batch size, or use multiple GPU cards to train this system.

Results

The training uses a single Tesla-V100-PCIE-32GB card. The system is trained for 300k iterations for one week. The training looks like:

Namespace(augmentation='none', batch_size=12, cuda=True, early_stop=300000, filename='main', learning_rate=0.0005, loss_type='regress_onset_offset_frame_velocity_bce', max_note_shift=0, mini_data=False, mode='train', model_type='Regress_onset_offset_frame_velocity_CRNN', reduce_iteration=10000, resume_iteration=0, workspace='.../workspaces/piano_transcription')
Using GPU.
train segments: 571589
Evaluate train segments: 571589
Evaluate validation segments: 68646
Evaluate test segments: 71959
------------------------------------
Iteration: 0
    Train statistics: {'frame_ap': 0.0613, 'reg_onset_mae': 0.514, 'reg_offset_mae': 0.482, 'velocity_mae': 0.1362}
    Validation statistics: {'frame_ap': 0.0605, 'reg_onset_mae': 0.5143, 'reg_offset_mae': 0.4819, 'velocity_mae': 0.133}
    Test statistics: {'frame_ap': 0.0601, 'reg_onset_mae': 0.5139, 'reg_offset_mae': 0.4821, 'velocity_mae': 0.1283}
    Dump statistics to .../workspaces/piano_transcription/statistics/main/Regress_onset_offset_frame_velocity_CRNN/loss_type=regress_onset_offset_frame_velocity_bce/augmentation=none/batch_size=12/statistics.pkl
    Dump statistics to .../workspaces/piano_transcription/statistics/main/Regress_onset_offset_frame_velocity_CRNN/loss_type=regress_onset_offset_frame_velocity_bce/augmentation=none/batch_size=12/statistics_2020-04-28_00-22-33.pickle
Train time: 5.498 s, validate time: 92.863 s
Model saved to .../workspaces/piano_transcription/checkpoints/main/Regress_onset_offset_frame_velocity_CRNN/loss_type=regress_onset_offset_frame_velocity_bce/augmentation=none/batch_size=12/0_iterations.pth
------------------------------------
...
------------------------------------
Iteration: 300000
    Train statistics: {'frame_ap': 0.9439, 'reg_onset_mae': 0.091, 'reg_offset_mae': 0.127, 'velocity_mae': 0.0241}
    Validation statistics: {'frame_ap': 0.9245, 'reg_onset_mae': 0.0985, 'reg_offset_mae': 0.1327, 'velocity_mae': 0.0265}
    Test statistics: {'frame_ap': 0.9285, 'reg_onset_mae': 0.097, 'reg_offset_mae': 0.1353, 'velocity_mae': 0.027}
    Dump statistics to .../workspaces/piano_transcription/statistics/main/Regress_onset_offset_frame_velocity_CRNN/loss_type=regress_onset_offset_frame_velocity_bce/augmentation=none/batch_size=12/statistics.pkl
    Dump statistics to .../workspaces/piano_transcription/statistics/main/Regress_onset_offset_frame_velocity_CRNN/loss_type=regress_onset_offset_frame_velocity_bce/augmentation=none/batch_size=12/statistics_2020-04-28_00-22-33.pickle
Train time: 8953.815 s, validate time: 93.683 s
Model saved to .../workspaces/piano_transcription/checkpoints/main/Regress_onset_offset_frame_velocity_CRNN/loss_type=regress_onset_offset_frame_velocity_bce/augmentation=none/batch_size=12/300000_iterations.pth

Visualization of piano transcription

Demo 1. Lang Lang: Franz Liszt - Love Dream (Liebestraum) [audio] [transcribed_midi]

Demo 2. Andras Schiff: J.S.Bach - French Suites [audio] [transcribed_midi]

FAQs

If users met running out of GPU memory error, then try to reduce batch size.

LICENSE

Apache 2.0

Applications

We have built a large-scale classical piano MIDI dataset using our piano transcription system. See https://github.com/bytedance/GiantMIDI-Piano for details.

Contact

Qiuqiang Kong, [email protected]

Cite

[1] Qiuqiang Kong, Bochen Li, Xuchen Song, Yuan Wan, and Yuxuan Wang. "High-resolution Piano Transcription with Pedals by Regressing Onsets and Offsets Times." arXiv preprint arXiv:2010.01815 (2020). [pdf]

piano_transcription's People

Contributors

Stargazers

Watchers

Forkers

mapleee c00renut satoshirobatofujimoto dusky3 johndpope rkelln lizhengdao danceperth animfanis123 manbaum smartfridgeiot ansonting zhuyawen a350-1000 magicafe jiaxi-dai zhang-ray boluoyu assassindesign caozhengquan husin123 tanminggang yuri147 lczxxx123 dmitry-ee stjordanis qinshulei chekelee insmoin leocmurphy noise-labs ksmos qianqi0614 anthonyjlem brightonpianojazz katie-cathy-hunt hadryan shuyuanzhao sonata165 orinsid yaningxu thzdyjy gyf513 yagyapandeya robird lrq619 windarka vincentkan taixuyin lanjinglingxx tts2 longli928 lukelluke lllgoyour innuendob mengzifds bedewu brisk-calorie hmrg-grmh yunfeigong fc-xianyu ntuyi qi-zihan jacqueschen1 caklove davinvlee ak391 gaussyoung kevinnnnqin ashinzeng lklklkj chris666-sys juheo shhyou ilovelylong hhh-0 heeeycu bensive dvaltchanov murongqimiao almostimplemented techthiyanes tristonev meilinhuang gregoryzeng songxuchen thinhhoang95 w3ss silverofoxy salviox jxmorris12 zhongshan233 uangjw wmjhome vincishark kouheifurukawa maminghan123 laurenceyoon cyber-ingwen 2018shawn

piano_transcription's Issues

Is there an official Docker image?

Hello, thank you for this project. Is there an official Docker image?

训练 note transcription system的问题

先生你好！十分感谢您抽空来看我的问题！
我是一位python的初学者，最近在使用您的piano_transcription来训练模型，现在遇到了这个问题，在执行train 的时候报错：
root : INFO <class 'main.training'>
root : INFO Using GPU.
root : INFO train segments: 670
root : INFO Evaluate train segments: 670
root : INFO Evaluate validation segments: 0
root : INFO Evaluate test segments: 0
GPU number: 1
root : INFO ------------------------------------
root : INFO Iteration: 0
Traceback (most recent call last):
File "D:/piano_transcription-master/paper/train 1.py", line 18, in
train(training)
File "D:\piano_transcription-master\pytorch\main.py", line 208, in train
validate_statistics = evaluator.evaluate(validate_loader)
File "D:\piano_transcription-master\pytorch\evaluate.py", line 51, in evaluate
output_dict = forward_dataloader(self.model, dataloader, self.batch_size)
File "D:\piano_transcription-master\pytorch\pytorch_utils.py", line 53, in forward_dataloader
for n, batch_data_dict in enumerate(dataloader):
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 746, in init
self._try_put_index()
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 861, in _try_put_index
index = self._next_index()
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 339, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "D:\piano_transcription-master\paper../utils\data_generator.py", line 302, in iter
index = self.segment_indexes[pointer]
IndexError: index 0 is out of bounds for axis 0 with size 0
是一开始定义的数组的问题吗？还是其他的问题？谢谢了！

I can't really find a solution to the problem

how to convert to core model

like this?

import torch
import onnx
import coremltools as ct

构建PyTorch模型

model = ...

定义输入样本，假设输入维度为(1, 1, 44100)，即单通道、采样率为44100的音频数据

input_shape = (1, 1, 44100)
input_sample = torch.randn(*input_shape)

将模型转换为ONNX模型

onnx_model_path = 'model.onnx'
torch.onnx.export(model, input_sample, onnx_model_path, input_names=['input'], output_names=['output'], opset_version=12)

将ONNX模型转换为Core ML模型

coreml_model_path = 'model.mlmodel'
coreml_model = ct.converters.onnx.convert(onnx_model_path, minimum_ios_deployment_target='13')
coreml_model.save(coreml_model_path)

一种取代MAESTRO训练集的思路

考虑通过midi只取关注点信息进行回放，再录音的方式获取数据集再进行训练。据目前来看，MAESTRO的midi至少有多踏板的信息，会引入新的自由度。
这样我想应该能得到更好的结果

Path error in `MaestroDataset`

I'm guessing this is the same problem as #33

Within __getitem__ of the MaestroDataset class, hdf5_name is
workspace\\hdf5s\\maestro\\2018\\MIDI-Unprocessed_Recital17-19_MID--AUDIO_17_R1_2018_wav--4.h5

so hdf5_path becomes this:
'./workspace\\hdf5s\\maestro\\2018\\workspace\\hdf5s\\maestro\\2018\\MIDI-Unprocessed_Recital17-19_MID--AUDIO_17_R1_2018_wav--4.h5'

Thus, when we start loading the batches in main.py, we get a FileNotFoundError because the path is incorrect.

Add soundfile to requirements.txt

I noticed I had to install soundfile manually -- I think it should be added to the requirements.txt file?

what i missing in my python3 ubuntu 20 env?

root@1-ubuntu-s-8vcpu-16gb-nyc1-01:~/piano_transcription# pip install -r requirements.txt
Collecting h5py==2.10.0
  Using cached h5py-2.10.0-cp38-cp38-manylinux1_x86_64.whl (2.9 MB)
Collecting pandas==1.1.2
  Downloading pandas-1.1.2-cp38-cp38-manylinux1_x86_64.whl (10.4 MB)
     |████████████████████████████████| 10.4 MB 14.8 MB/s
Collecting librosa==0.6.0
  Downloading librosa-0.6.0.tar.gz (1.5 MB)
     |████████████████████████████████| 1.5 MB 69.6 MB/s
Collecting numba==0.48
  Downloading numba-0.48.0-1-cp38-cp38-manylinux2014_x86_64.whl (3.6 MB)
     |████████████████████████████████| 3.6 MB 67.2 MB/s
Collecting mido==1.2.9
  Downloading mido-1.2.9-py2.py3-none-any.whl (52 kB)
     |████████████████████████████████| 52 kB 2.6 MB/s
Collecting mir_eval==0.5
  Downloading mir_eval-0.5.tar.gz (86 kB)
     |████████████████████████████████| 86 kB 11.1 MB/s
Collecting matplotlib==3.0.3
  Downloading matplotlib-3.0.3.tar.gz (36.6 MB)
     |████████████████████████████████| 36.6 MB 36.4 MB/s
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-m3wp3r8i/matplotlib/setup.py'"'"'; __file__='"'"'/tmp/pip-install-m3wp3r8i/matplotlib/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-m3wp3r8i/matplotlib/pip-egg-info
         cwd: /tmp/pip-install-m3wp3r8i/matplotlib/
    Complete output (51 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-m3wp3r8i/matplotlib/setup.py", line 225, in <module>
        msg = pkg.install_help_msg()
      File "/tmp/pip-install-m3wp3r8i/matplotlib/setupext.py", line 650, in install_help_msg
        release = platform.linux_distribution()[0].lower()
    AttributeError: module 'platform' has no attribute 'linux_distribution'
    IMPORTANT WARNING:
        pkg-config is not installed.
        matplotlib may not be able to find some of its dependencies
    ============================================================================
    Edit setup.cfg to change the build options

    BUILDING MATPLOTLIB
                matplotlib: yes [3.0.3]
                    python: yes [3.8.5 (default, May 27 2021, 13:30:53)  [GCC
                            9.3.0]]
                  platform: yes [linux]

    REQUIRED DEPENDENCIES AND EXTENSIONS
                     numpy: yes [version 1.18.5]
          install_requires: yes [handled by setuptools]
                    libagg: yes [pkg-config information for 'libagg' could not
                            be found. Using local copy.]
                  freetype: no  [The C/C++ header for freetype2 (ft2build.h)
                            could not be found.  You may need to install the
                            development package.]
                       png: no  [pkg-config information for 'libpng' could not
                            be found.]
                     qhull: yes [pkg-config information for 'libqhull' could not
                            be found. Using local copy.]

    OPTIONAL SUBPACKAGES
               sample_data: yes [installing]
                  toolkits: yes [installing]
                     tests: no  [skipping due to configuration]
            toolkits_tests: no  [skipping due to configuration]

    OPTIONAL BACKEND EXTENSIONS
                       agg: yes [installing]
                     tkagg: yes [installing; run-time loading from Python Tcl /
                            Tk]
                    macosx: no  [Mac OS-X only]
                 windowing: no  [Microsoft Windows only]

    OPTIONAL PACKAGE DATA
                      dlls: no  [skipping due to configuration]

    ============================================================================
                            * The following required packages can not be built:
                            * freetype, png
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

I used a MAC-M1, and used VS-Code & Conda environment

I had 2 problems:

If you see the following:

Please see Issue 24(just make librosa==0.8.0) & Thanks Yuqing for the generous help!
If you see the following:

You can run conda install python.app, then copy your code then use pythonw your_script.py instead of python your_script.py

Hope my failure helps!

Logical bug in MaestroDataset

In utils/data_generator.py, line 86, we ensure we grab a segment that is contained by the waveform:

        # Load hdf5
        with h5py.File(hdf5_path, 'r') as hf:
            start_sample = int(start_time * self.sample_rate)
            end_sample = start_sample + self.segment_samples

            if end_sample >= hf['waveform'].shape[0]:
                start_sample -= self.segment_samples 
                end_sample -= self.segment_samples

However, you fail to update start_time, so when you later grab the target_dict, it will be off by self.segment_seconds.

            # Process MIDI events to target
            (target_dict, note_events, pedal_events) = \
                self.target_processor.process(start_time, midi_events_time, 
                    midi_events, extend_pedal=True, note_shift=note_shift)

I don't think this is an issue, because your Sampler logic only constructs meta for valid segments:

while (start_time + self.segment_seconds < hf.attrs['duration'])

but it is still a logical error so I thought I would report and offer a fix.

My problem just like this

Incorporate Beat Detection

First and foremost, I'd like to express my admiration and commend the incredible work you've accomplished with the piano_transcription project. As a user, I am consistently impressed by its performance and accuracy in transcribing piano music to MIDI format. Your dedication to advancing music technology is evident and greatly appreciated by the community.

That being said, I believe there is an opportunity to further enhance the functionality and utility of piano_transcription by implementing a beat detection feature. Currently, the transcribed MIDI output defaults to 120 BPM, which is practical but doesn't account for the dynamic tempo variations that musicians often employ to convey emotion and emphasis in their performances.

Feature Suggestion: Dynamic Tempo Detection and Transcription

The goal of this feature is to analyze the audio input and accurately detect the tempo, translating these findings to the MIDI output.
This proposed feature would enable the MIDI to reflect the actual tempo throughout the piece, accommodating accelerandos, ritardandos, and other tempo changes.
The resulting MIDI files would be musically correct, reflecting the true intent of the original performance.
This enhancement would significantly improve the experience for users looking to create sheet music from audio files, as it would provide a more accurate representation of the timing and pacing of the music.

Implementing dynamic beat detection could be a game-changer for composers, arrangers, and anyone interested in creating faithful transcriptions of piano recordings. Understanding that this may be a complex feature to develop, I'm curious to learn about potential plans or considerations you might have regarding the capture of tempo variations in your transcriptions.

Thank you again for your exceptional work on this project—the impact it has had on the music tech community is remarkable. I look forward to any thoughts or discussions this suggestion might inspire.

会将琵琶古筝等乐器的音也识别出来？

有没有办法区分这些民族乐器？

For a better user experience, it is recommended to use the `requests` library to download the model, as many computers do not have the `wget` tool.

I wrote some code to demonstrate it. It's easy to be integrated into your library code. I used some f-strings for convenience.

import sys

import requests
from pathlib import Path
import hashlib


class ProgressBar:
    def __init__(self, title, total, running_str='Running', completed=0):
        self.title = title
        self.total = total
        self.total_str = self.convert(total)
        self.completed = completed
        self.status = running_str

    @staticmethod
    def convert(size: int):
        units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB', 'BB', 'NB', 'DB', 'CB']

        for unit in units:
            if size >= 1024:
                size /= 1024
            else:
                return f'{size:.2f} {unit}'

        return f'{size} B'

    def __str__(self):
        return f'[{self.status}] {self.title} {self.convert(self.completed)} of {self.total_str},{self.completed * 100 / self.total : .2f}% Completed'

    def update(self, completed=1):
        self.completed += completed
        print(f'\r{self}', end='')


response = requests.get(
    'https://zenodo.org/record/4034264/files/CRNN_note_F1%3D0.9677_pedal_F1%3D0.9186.pth?download=1',
    stream=True)
response.raise_for_status()
chunk_size = 1024 * 1024
content_size = int(response.headers['content-length'])
progress = ProgressBar('Model', total=content_size, running_str="Downloading")
sha1 = hashlib.sha1()
with open(f'{Path.home()}/piano_transcription_inference_data/note_F1=0.9677_pedal_F1=0.9186.pth', "wb") as file:
    for data in response.iter_content(chunk_size=chunk_size):
        file.write(data)
        sha1.update(data)
        progress.update(completed=len(data))
if sha1.hexdigest() == 'b06d5ab55ff57beae8ab2a76205d5847add01fec':
    print('\nSucceeded in downloading model :)')
else:
    print('\nModel Corrupted :( Please download again!', file=sys.stderr)
    sys.exit(-1)

how to output as .xml , then we can edit it in Sibelius

Anyone know how to output as .xml , then we can edit it in Sibelius

Great Thanks

hello can you help me ,can you uploud video file to produce your project.

一个由于精度引发的对于输出处理的问题

在pytorch的环境中，使用librosa重采样之后输出的结果没有问题。
在使用onnx或者切换重采样的方法后，发现输出的结果发生了比较大的变化。
而这个变化影响最大的就是Midi文件的输出。
reg_onset_output, reg_offset_output, reg_pedal_onset_output, reg_pedal_offset_output 存在一些比较大的误差 (其他三个输出可能影响不是太大所以没有列举)
由于reg_onset_output的一些误差，导致了大于阈值的数目高于期待的数目 (torch+原采样算法的输出)，最终导致输出的midi出现了“意料之外的音符”。
所以在此我有两个问题想要请教：
如果只使用frame_output来判断音符的有无 (目前我采用这种方法，确实改善了onnx的输出)，会不会在一些比较特殊的输入中有着比较差的表现？使用reg_onset_output判断NoteOn事件比起frame_output有哪些好处？

训练时将segment_seconds改为0.1，hop_seconds改为0.01，frames_per_second改为20出现结果为tensor(nan, grad_fn=<AddBackward0>)

Python is not installed as a framework? what does this mean?

I got the following error when I tried to use the package. Can you help resolve it? Thanks!

Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

from piano_transcription_inference import PianoTranscription, sample_rate, load_audio
Traceback (most recent call last):
File "", line 1, in
File "/Users/dli/projects/personal/piano_transcription/lib/python3.6/site-packages/piano_transcription_inference/init.py", line 1, in
from .inference import PianoTranscription
File "/Users/dli/projects/personal/piano_transcription/lib/python3.6/site-packages/piano_transcription_inference/inference.py", line 11, in
from .models import Regress_onset_offset_frame_velocity_CRNN, Note_pedal
File "/Users/dli/projects/personal/piano_transcription/lib/python3.6/site-packages/piano_transcription_inference/models.py", line 6, in
import matplotlib.pyplot as plt
File "/Users/dli/projects/personal/piano_transcription/lib/python3.6/site-packages/matplotlib/pyplot.py", line 2372, in
switch_backend(rcParams["backend"])
File "/Users/dli/projects/personal/piano_transcription/lib/python3.6/site-packages/matplotlib/pyplot.py", line 207, in switch_backend
backend_mod = importlib.import_module(backend_name)
File "/Users/dli/projects/personal/piano_transcription/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/Users/dli/projects/personal/piano_transcription/lib/python3.6/site-packages/matplotlib/backends/backend_macosx.py", line 14, in
from matplotlib.backends import _macosx
ImportError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are using (Ana)Conda please install python.app and replace the use of 'python' with 'pythonw'. See 'Working with Matplotlib on OSX' in the Matplotlib FAQ for more information.

Evaluation Performance

Hi,

Thanks for sharing your work.

When I use the pre-trained model checkpoint from your repo(https://github.com/qiuqiangkong/piano_transcription_inference),
I can't get the performance of the original paper's score.

I set the arguments like below.

Much lower performance results were obtained.

Can I get your paper's score with this source code? or do I need to edit something?
(https://github.com/bytedance/piano_transcription/blob/master/pytorch/calculate_score_for_paper.py)

Thank you.

Piano transcription on Windows 10

Hello,
I am a complete Python novice. But I have managed to get piano_transcription_inference to work on my Windows 10 system, with pretty impressive results. I'm attaching a slightly modified set of commands. I normally need to hit return after the third line if the system appears to hang. I'm sure someone can come up with a more elegant solution, but this works for me. I hope this is useful.
Best wishes, and congratulations on what looks to be a very useful program.
Mick Hamer

piano transcription commands.txt

Pre-trained models of ablation study

Hi,
Thanks for sharing your work.

In the paper, there are two ablations where the frame head does not depend on the outputs of velocity and offset heads (Table I). Could you please share those checkpoints as well?

Thank you.

Sustain compensation?

I just got this repo to work on Windows 11. Had to switch to Python 3.7, install torch manually and had to provide a Windows binary of wget as well as ffmpeg in my path.

While everything seems to work really awesome, I have a question though:
I tried to transcribe a rather simple song and noticed that the original tune makes use of the sustain pedal a lot resulting in a transcription that repeats single notes over and over resulting in a sheet that looks as if there are plenty of notes being played at once while there are actually only two at a time for instance.

Is there any way to prevent this from happening?

calculate_score_for_paper.py复现论文性能失败

尊敬的作者，下午好：
我是来自北京中关村的一名开发人员，目前刚开始接触钢琴转谱领域；
我下载了Maestro数据集，以及wget到了预训练的模型，并按照calculate_score_for_paper.py跑通了流程，
最终测试性能note_f1仅有0.81，这个是我哪里使用有问题吗？

计算xxset regression的小问题

utils/utilities.py文件第550行
我觉得这里计算的逻辑是，对于两个相邻的xxset，它们连线中点所在帧以及该帧后面的帧，应该计算这些帧到后一个xxset的距离。所以
output[t] = step * (t - locts[i + 1]) - input[locts[i]]
或许应该改为
output[t] = step * (t - locts[i + 1]) - input[locts[i+1]]
是这样吗?

发现了个BUG，路径方面的/Found a bug, path-wise.

data_generator里面74行，hdf5_path = os.path.join(self.hdf5s_dir, year, hdf5_name)，就这么训会导致报错：

Traceback (most recent call last):
File "pytorch/main.py", line 297, in
train(args)
File "pytorch/main.py", line 201, in train
for batch_data_dict in train_loader:
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\torch\utils\data\dataloader.py", line 345, in next
data = self._next_data()
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\torch\utils\data\dataloader.py", line 856, in _next_data
return self._process_data(data)
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\torch\utils\data\dataloader.py", line 881, in _process_data
data.reraise()
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\torch_utils.py", line 394, in reraise
raise self.exc_type(msg)
OSError: Caught OSError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\torch\utils\data_utils\worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "F:\pythonProject2\piano_transcription\pytorch../utils\data_generator.py", line 82, in getitem
with h5py.File(hdf5_path, 'r') as hf:
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\h5py_hl\files.py", line 408, in init
swmr=swmr)
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\h5py_hl\files.py", line 173, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'WORKSPACE\hdf5s\maestro\2015\WORKSPACE\hdf5s\maestro\2015\MIDI-Unprocessed_R1_D1-1-8_mid--AUDIO-from_mp3_06_R1_2015_wav--1.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

改成 hdf5_path = os.path.join(hdf5_name)就不报错了
BUG原因不知道，希望谁能给看一下，不知道是不是库版本问题或者啥之间的匹配问题或者我操作有问题亦有可能是windows独有BUG