bytedance / piano_transcription Goto Github PK

Python 98.53% Shell 1.47%

piano_transcription's Issues

My problem just like this

calculate_score_for_paper.py复现论文性能失败

尊敬的作者，下午好：
我是来自北京中关村的一名开发人员，目前刚开始接触钢琴转谱领域；
我下载了Maestro数据集，以及wget到了预训练的模型，并按照calculate_score_for_paper.py跑通了流程，
最终测试性能note_f1仅有0.81，这个是我哪里使用有问题吗？

训练 note transcription system的问题

先生你好！十分感谢您抽空来看我的问题！
我是一位python的初学者，最近在使用您的piano_transcription来训练模型，现在遇到了这个问题，在执行train 的时候报错：
root : INFO <class 'main.training'>
root : INFO Using GPU.
root : INFO train segments: 670
root : INFO Evaluate train segments: 670
root : INFO Evaluate validation segments: 0
root : INFO Evaluate test segments: 0
GPU number: 1
root : INFO ------------------------------------
root : INFO Iteration: 0
Traceback (most recent call last):
File "D:/piano_transcription-master/paper/train 1.py", line 18, in
train(training)
File "D:\piano_transcription-master\pytorch\main.py", line 208, in train
validate_statistics = evaluator.evaluate(validate_loader)
File "D:\piano_transcription-master\pytorch\evaluate.py", line 51, in evaluate
output_dict = forward_dataloader(self.model, dataloader, self.batch_size)
File "D:\piano_transcription-master\pytorch\pytorch_utils.py", line 53, in forward_dataloader
for n, batch_data_dict in enumerate(dataloader):
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 279, in iter
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 746, in init
self._try_put_index()
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 861, in _try_put_index
index = self._next_index()
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\utils\data\dataloader.py", line 339, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "D:\piano_transcription-master\paper../utils\data_generator.py", line 302, in iter
index = self.segment_indexes[pointer]
IndexError: index 0 is out of bounds for axis 0 with size 0
是一开始定义的数组的问题吗？还是其他的问题？谢谢了！

Python is not installed as a framework? what does this mean?

I got the following error when I tried to use the package. Can you help resolve it? Thanks!

Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

from piano_transcription_inference import PianoTranscription, sample_rate, load_audio
Traceback (most recent call last):
File "", line 1, in
File "/Users/dli/projects/personal/piano_transcription/lib/python3.6/site-packages/piano_transcription_inference/init.py", line 1, in
from .inference import PianoTranscription
File "/Users/dli/projects/personal/piano_transcription/lib/python3.6/site-packages/piano_transcription_inference/inference.py", line 11, in
from .models import Regress_onset_offset_frame_velocity_CRNN, Note_pedal
File "/Users/dli/projects/personal/piano_transcription/lib/python3.6/site-packages/piano_transcription_inference/models.py", line 6, in
import matplotlib.pyplot as plt
File "/Users/dli/projects/personal/piano_transcription/lib/python3.6/site-packages/matplotlib/pyplot.py", line 2372, in
switch_backend(rcParams["backend"])
File "/Users/dli/projects/personal/piano_transcription/lib/python3.6/site-packages/matplotlib/pyplot.py", line 207, in switch_backend
backend_mod = importlib.import_module(backend_name)
File "/Users/dli/projects/personal/piano_transcription/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/Users/dli/projects/personal/piano_transcription/lib/python3.6/site-packages/matplotlib/backends/backend_macosx.py", line 14, in
from matplotlib.backends import _macosx
ImportError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are using (Ana)Conda please install python.app and replace the use of 'python' with 'pythonw'. See 'Working with Matplotlib on OSX' in the Matplotlib FAQ for more information.

how to output as .xml , then we can edit it in Sibelius

Anyone know how to output as .xml , then we can edit it in Sibelius

Great Thanks

hello can you help me ,can you uploud video file to produce your project.

I can't really find a solution to the problem

how to convert to core model

like this?

import torch
import onnx
import coremltools as ct

构建PyTorch模型

model = ...

定义输入样本，假设输入维度为(1, 1, 44100)，即单通道、采样率为44100的音频数据

input_shape = (1, 1, 44100)
input_sample = torch.randn(*input_shape)

将模型转换为ONNX模型

onnx_model_path = 'model.onnx'
torch.onnx.export(model, input_sample, onnx_model_path, input_names=['input'], output_names=['output'], opset_version=12)

将ONNX模型转换为Core ML模型

coreml_model_path = 'model.mlmodel'
coreml_model = ct.converters.onnx.convert(onnx_model_path, minimum_ios_deployment_target='13')
coreml_model.save(coreml_model_path)

训练时将segment_seconds改为0.1，hop_seconds改为0.01，frames_per_second改为20出现结果为tensor(nan, grad_fn=<AddBackward0>)

Logical bug in MaestroDataset

In utils/data_generator.py, line 86, we ensure we grab a segment that is contained by the waveform:

        # Load hdf5
        with h5py.File(hdf5_path, 'r') as hf:
            start_sample = int(start_time * self.sample_rate)
            end_sample = start_sample + self.segment_samples

            if end_sample >= hf['waveform'].shape[0]:
                start_sample -= self.segment_samples 
                end_sample -= self.segment_samples

However, you fail to update start_time, so when you later grab the target_dict, it will be off by self.segment_seconds.

            # Process MIDI events to target
            (target_dict, note_events, pedal_events) = \
                self.target_processor.process(start_time, midi_events_time, 
                    midi_events, extend_pedal=True, note_shift=note_shift)

I don't think this is an issue, because your Sampler logic only constructs meta for valid segments:

while (start_time + self.segment_seconds < hf.attrs['duration'])

but it is still a logical error so I thought I would report and offer a fix.

Is there an official Docker image?

Hello, thank you for this project. Is there an official Docker image?

实时转录的精度问题，可以以2秒作为切片进行转录嘛？

我尝试了将音频切为2秒进行转录，经常会报数据越界错位，但不是所有的都会报这样的错误。但是改变切片参数后效果不佳？请问作者和各位大佬，怎么处理音频时长低于5秒的情况

计算xxset regression的小问题

utils/utilities.py文件第550行
我觉得这里计算的逻辑是，对于两个相邻的xxset，它们连线中点所在帧以及该帧后面的帧，应该计算这些帧到后一个xxset的距离。所以
output[t] = step * (t - locts[i + 1]) - input[locts[i]]
或许应该改为
output[t] = step * (t - locts[i + 1]) - input[locts[i+1]]
是这样吗?

Sustain compensation?

I just got this repo to work on Windows 11. Had to switch to Python 3.7, install torch manually and had to provide a Windows binary of wget as well as ffmpeg in my path.

While everything seems to work really awesome, I have a question though:
I tried to transcribe a rather simple song and noticed that the original tune makes use of the sustain pedal a lot resulting in a transcription that repeats single notes over and over resulting in a sheet that looks as if there are plenty of notes being played at once while there are actually only two at a time for instance.

Is there any way to prevent this from happening?

Evaluation Performance

Hi,

Thanks for sharing your work.

When I use the pre-trained model checkpoint from your repo(https://github.com/qiuqiangkong/piano_transcription_inference),
I can't get the performance of the original paper's score.

I set the arguments like below.

Much lower performance results were obtained.

Can I get your paper's score with this source code? or do I need to edit something?
(https://github.com/bytedance/piano_transcription/blob/master/pytorch/calculate_score_for_paper.py)

Thank you.

一个由于精度引发的对于输出处理的问题

在pytorch的环境中，使用librosa重采样之后输出的结果没有问题。
在使用onnx或者切换重采样的方法后，发现输出的结果发生了比较大的变化。
而这个变化影响最大的就是Midi文件的输出。
reg_onset_output, reg_offset_output, reg_pedal_onset_output, reg_pedal_offset_output 存在一些比较大的误差 (其他三个输出可能影响不是太大所以没有列举)
由于reg_onset_output的一些误差，导致了大于阈值的数目高于期待的数目 (torch+原采样算法的输出)，最终导致输出的midi出现了“意料之外的音符”。
所以在此我有两个问题想要请教：
如果只使用frame_output来判断音符的有无 (目前我采用这种方法，确实改善了onnx的输出)，会不会在一些比较特殊的输入中有着比较差的表现？使用reg_onset_output判断NoteOn事件比起frame_output有哪些好处？

会将琵琶古筝等乐器的音也识别出来？

有没有办法区分这些民族乐器？

For a better user experience, it is recommended to use the `requests` library to download the model, as many computers do not have the `wget` tool.

I wrote some code to demonstrate it. It's easy to be integrated into your library code. I used some f-strings for convenience.

import sys

import requests
from pathlib import Path
import hashlib


class ProgressBar:
    def __init__(self, title, total, running_str='Running', completed=0):
        self.title = title
        self.total = total
        self.total_str = self.convert(total)
        self.completed = completed
        self.status = running_str

    @staticmethod
    def convert(size: int):
        units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB', 'BB', 'NB', 'DB', 'CB']

        for unit in units:
            if size >= 1024:
                size /= 1024
            else:
                return f'{size:.2f} {unit}'

        return f'{size} B'

    def __str__(self):
        return f'[{self.status}] {self.title} {self.convert(self.completed)} of {self.total_str},{self.completed * 100 / self.total : .2f}% Completed'

    def update(self, completed=1):
        self.completed += completed
        print(f'\r{self}', end='')


response = requests.get(
    'https://zenodo.org/record/4034264/files/CRNN_note_F1%3D0.9677_pedal_F1%3D0.9186.pth?download=1',
    stream=True)
response.raise_for_status()
chunk_size = 1024 * 1024
content_size = int(response.headers['content-length'])
progress = ProgressBar('Model', total=content_size, running_str="Downloading")
sha1 = hashlib.sha1()
with open(f'{Path.home()}/piano_transcription_inference_data/note_F1=0.9677_pedal_F1=0.9186.pth', "wb") as file:
    for data in response.iter_content(chunk_size=chunk_size):
        file.write(data)
        sha1.update(data)
        progress.update(completed=len(data))
if sha1.hexdigest() == 'b06d5ab55ff57beae8ab2a76205d5847add01fec':
    print('\nSucceeded in downloading model :)')
else:
    print('\nModel Corrupted :( Please download again!', file=sys.stderr)
    sys.exit(-1)

Add soundfile to requirements.txt

I noticed I had to install soundfile manually -- I think it should be added to the requirements.txt file?

I used a MAC-M1, and used VS-Code & Conda environment

I had 2 problems:

If you see the following:

Please see Issue 24(just make librosa==0.8.0) & Thanks Yuqing for the generous help!
If you see the following:

You can run conda install python.app, then copy your code then use pythonw your_script.py instead of python your_script.py

Hope my failure helps!

what i missing in my python3 ubuntu 20 env?

root@1-ubuntu-s-8vcpu-16gb-nyc1-01:~/piano_transcription# pip install -r requirements.txt
Collecting h5py==2.10.0
  Using cached h5py-2.10.0-cp38-cp38-manylinux1_x86_64.whl (2.9 MB)
Collecting pandas==1.1.2
  Downloading pandas-1.1.2-cp38-cp38-manylinux1_x86_64.whl (10.4 MB)
     |████████████████████████████████| 10.4 MB 14.8 MB/s
Collecting librosa==0.6.0
  Downloading librosa-0.6.0.tar.gz (1.5 MB)
     |████████████████████████████████| 1.5 MB 69.6 MB/s
Collecting numba==0.48
  Downloading numba-0.48.0-1-cp38-cp38-manylinux2014_x86_64.whl (3.6 MB)
     |████████████████████████████████| 3.6 MB 67.2 MB/s
Collecting mido==1.2.9
  Downloading mido-1.2.9-py2.py3-none-any.whl (52 kB)
     |████████████████████████████████| 52 kB 2.6 MB/s
Collecting mir_eval==0.5
  Downloading mir_eval-0.5.tar.gz (86 kB)
     |████████████████████████████████| 86 kB 11.1 MB/s
Collecting matplotlib==3.0.3
  Downloading matplotlib-3.0.3.tar.gz (36.6 MB)
     |████████████████████████████████| 36.6 MB 36.4 MB/s
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-m3wp3r8i/matplotlib/setup.py'"'"'; __file__='"'"'/tmp/pip-install-m3wp3r8i/matplotlib/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-m3wp3r8i/matplotlib/pip-egg-info
         cwd: /tmp/pip-install-m3wp3r8i/matplotlib/
    Complete output (51 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-m3wp3r8i/matplotlib/setup.py", line 225, in <module>
        msg = pkg.install_help_msg()
      File "/tmp/pip-install-m3wp3r8i/matplotlib/setupext.py", line 650, in install_help_msg
        release = platform.linux_distribution()[0].lower()
    AttributeError: module 'platform' has no attribute 'linux_distribution'
    IMPORTANT WARNING:
        pkg-config is not installed.
        matplotlib may not be able to find some of its dependencies
    ============================================================================
    Edit setup.cfg to change the build options

    BUILDING MATPLOTLIB
                matplotlib: yes [3.0.3]
                    python: yes [3.8.5 (default, May 27 2021, 13:30:53)  [GCC
                            9.3.0]]
                  platform: yes [linux]

    REQUIRED DEPENDENCIES AND EXTENSIONS
                     numpy: yes [version 1.18.5]
          install_requires: yes [handled by setuptools]
                    libagg: yes [pkg-config information for 'libagg' could not
                            be found. Using local copy.]
                  freetype: no  [The C/C++ header for freetype2 (ft2build.h)
                            could not be found.  You may need to install the
                            development package.]
                       png: no  [pkg-config information for 'libpng' could not
                            be found.]
                     qhull: yes [pkg-config information for 'libqhull' could not
                            be found. Using local copy.]

    OPTIONAL SUBPACKAGES
               sample_data: yes [installing]
                  toolkits: yes [installing]
                     tests: no  [skipping due to configuration]
            toolkits_tests: no  [skipping due to configuration]

    OPTIONAL BACKEND EXTENSIONS
                       agg: yes [installing]
                     tkagg: yes [installing; run-time loading from Python Tcl /
                            Tk]
                    macosx: no  [Mac OS-X only]
                 windowing: no  [Microsoft Windows only]

    OPTIONAL PACKAGE DATA
                      dlls: no  [skipping due to configuration]

    ============================================================================
                            * The following required packages can not be built:
                            * freetype, png
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

请问一下，如果我要从头开始训练，我该如何处理那两个环境变量呢？

就是runme里面的worksapce和datasetdir，我是应该在piano_transcription-master这个文件夹下增加./datasets/maestro/dataset_root和./workspaces/piano_transcription这两个目录后设置吗？

Pre-trained models of ablation study

Hi,
Thanks for sharing your work.

In the paper, there are two ablations where the frame head does not depend on the outputs of velocity and offset heads (Table I). Could you please share those checkpoints as well?

Thank you.

一种取代MAESTRO训练集的思路

考虑通过midi只取关注点信息进行回放，再录音的方式获取数据集再进行训练。据目前来看，MAESTRO的midi至少有多踏板的信息，会引入新的自由度。
这样我想应该能得到更好的结果

Incorporate Beat Detection

First and foremost, I'd like to express my admiration and commend the incredible work you've accomplished with the piano_transcription project. As a user, I am consistently impressed by its performance and accuracy in transcribing piano music to MIDI format. Your dedication to advancing music technology is evident and greatly appreciated by the community.

That being said, I believe there is an opportunity to further enhance the functionality and utility of piano_transcription by implementing a beat detection feature. Currently, the transcribed MIDI output defaults to 120 BPM, which is practical but doesn't account for the dynamic tempo variations that musicians often employ to convey emotion and emphasis in their performances.

Feature Suggestion: Dynamic Tempo Detection and Transcription

The goal of this feature is to analyze the audio input and accurately detect the tempo, translating these findings to the MIDI output.
This proposed feature would enable the MIDI to reflect the actual tempo throughout the piece, accommodating accelerandos, ritardandos, and other tempo changes.
The resulting MIDI files would be musically correct, reflecting the true intent of the original performance.
This enhancement would significantly improve the experience for users looking to create sheet music from audio files, as it would provide a more accurate representation of the timing and pacing of the music.

Implementing dynamic beat detection could be a game-changer for composers, arrangers, and anyone interested in creating faithful transcriptions of piano recordings. Understanding that this may be a complex feature to develop, I'm curious to learn about potential plans or considerations you might have regarding the capture of tempo variations in your transcriptions.

Thank you again for your exceptional work on this project—the impact it has had on the music tech community is remarkable. I look forward to any thoughts or discussions this suggestion might inspire.

发现了个BUG，路径方面的/Found a bug, path-wise.

data_generator里面74行，hdf5_path = os.path.join(self.hdf5s_dir, year, hdf5_name)，就这么训会导致报错：

Traceback (most recent call last):
File "pytorch/main.py", line 297, in
train(args)
File "pytorch/main.py", line 201, in train
for batch_data_dict in train_loader:
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\torch\utils\data\dataloader.py", line 345, in next
data = self._next_data()
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\torch\utils\data\dataloader.py", line 856, in _next_data
return self._process_data(data)
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\torch\utils\data\dataloader.py", line 881, in _process_data
data.reraise()
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\torch_utils.py", line 394, in reraise
raise self.exc_type(msg)
OSError: Caught OSError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\torch\utils\data_utils\worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "F:\pythonProject2\piano_transcription\pytorch../utils\data_generator.py", line 82, in getitem
with h5py.File(hdf5_path, 'r') as hf:
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\h5py_hl\files.py", line 408, in init
swmr=swmr)
File "C:\Users\ASUS\anaconda3\envs\byte\lib\site-packages\h5py_hl\files.py", line 173, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'WORKSPACE\hdf5s\maestro\2015\WORKSPACE\hdf5s\maestro\2015\MIDI-Unprocessed_R1_D1-1-8_mid--AUDIO-from_mp3_06_R1_2015_wav--1.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

改成 hdf5_path = os.path.join(hdf5_name)就不报错了
BUG原因不知道，希望谁能给看一下，不知道是不是库版本问题或者啥之间的匹配问题或者我操作有问题亦有可能是windows独有BUG

建议不能使用cuda时可以报错，而不是return None

一个不成熟的建议，就是在检测不能使用cuda而PianoTranscription中device参数选择了cuda时，可以报错，而不是返回None

Piano transcription on Windows 10

Hello,
I am a complete Python novice. But I have managed to get piano_transcription_inference to work on my Windows 10 system, with pretty impressive results. I'm attaching a slightly modified set of commands. I normally need to hit return after the third line if the system appears to hang. I'm sure someone can come up with a more elegant solution, but this works for me. I hope this is useful.
Best wishes, and congratulations on what looks to be a very useful program.
Mick Hamer

piano transcription commands.txt

Path error in `MaestroDataset`

I'm guessing this is the same problem as #33

Within __getitem__ of the MaestroDataset class, hdf5_name is
workspace\\hdf5s\\maestro\\2018\\MIDI-Unprocessed_Recital17-19_MID--AUDIO_17_R1_2018_wav--4.h5

so hdf5_path becomes this:
'./workspace\\hdf5s\\maestro\\2018\\workspace\\hdf5s\\maestro\\2018\\MIDI-Unprocessed_Recital17-19_MID--AUDIO_17_R1_2018_wav--4.h5'

Thus, when we start loading the batches in main.py, we get a FileNotFoundError because the path is incorrect.

bytedance / piano_transcription Goto Github PK

piano_transcription's Issues

构建PyTorch模型

定义输入样本，假设输入维度为(1, 1, 44100)，即单通道、采样率为44100的音频数据

将模型转换为ONNX模型

将ONNX模型转换为Core ML模型

Recommend Projects

Recommend Topics

Recommend Org