Coder Social home page Coder Social logo

microsoft / muzic Goto Github PK

View Code? Open in Web Editor NEW
4.2K 75.0 406.0 151.73 MB

Muzic: Music Understanding and Generation with Artificial Intelligence

License: MIT License

Python 98.29% Shell 1.45% C++ 0.06% Cuda 0.20%
music music-composition ai-music deep-learning

muzic's Introduction




Muzic is a research project on AI music that empowers music understanding and generation with deep learning and artificial intelligence. Muzic is pronounced as [ˈmjuːzeik]. Besides the logo in image version (see above), Muzic also has a logo in video version (you can click here to watch ). Muzic was started by some researchers from Microsoft Research Asia and also contributed by outside collaborators.


We summarize the scope of our Muzic project in the following figure:


The current work in Muzic includes:


For more speech related research, you can find from this page: https://speechresearch.github.io/ and https://github.com/microsoft/NeuralSpeech.

We are hiring!

We are hiring both research FTEs and research interns on Speech/Audio/Music/Video and LLMs. Please get in touch with Xu Tan ([email protected]) if you are interested.

What is New?

  • CLaMP has won the Best Student Paper Award at ISMIR 2023!
  • We release MusicAgent, an AI agent for versatile music processing using large language models.
  • We release MuseCoco, a music composition copilot to generate symbolic music from text.
  • We release GETMusic, a versatile music copliot with a universal representation and diffusion framework to generate any music tracks.
  • We release the first model for cross-modal symbolic MIR: CLaMP.
  • We release two new research work on music structure modeling: MeloForm and Museformer.
  • We give a tutorial on AI Music Composition at ACM Multimedia 2021.

Requirements

The operating system is Linux. We test on Ubuntu 16.04.6 LTS, CUDA 10, with Python 3.6.12. The requirements for running Muzic are listed in requirements.txt. To install the requirements, run:

pip install -r requirements.txt

We release the code of several research work: MusicBERT, PDAugment, CLaMP, DeepRapper, SongMASS, TeleMelody, ReLyMe, Re-creation of Creations (ROC), MeloForm, Museformer, GETMusic, MuseCoco, and MusicAgent. You can find the README in the corresponding folder for detailed instructions on how to use.

Reference

If you find the Muzic project useful in your work, you can cite the papers as follows:

  • [1] MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training, Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu, ACL 2021.
  • [2] PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription, Chen Zhang, Jiaxing Yu, Luchin Chang, Xu Tan, Jiawei Chen, Tao Qin, Kejun Zhang, ISMIR 2022.
  • [3] DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling, Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, Tie-Yan Liu, ACL 2021.
  • [4] SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint, Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, Wei Ye, Shikun Zhang, Tao Qin, AAAI 2021.
  • [5] TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method, Zeqian Ju, Peiling Lu, Xu Tan, Rui Wang, Chen Zhang, Songruoyao Wu, Kejun Zhang, Xiangyang Li, Tao Qin, Tie-Yan Liu, EMNLP 2022.
  • [6] ReLyMe: Improving Lyric-to-Melody Generation by Incorporating Lyric-Melody Relationships, Chen Zhang, LuChin Chang, Songruoyao Wu, Xu Tan, Tao Qin, Tie-Yan Liu, Kejun Zhang, ACM Multimedia 2022.
  • [7] Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation, Ang Lv, Xu Tan, Tao Qin, Tie-Yan Liu, Rui Yan, arXiv 2022.
  • [8] MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks, Peiling Lu, Xu Tan, Botao Yu, Tao Qin, Sheng Zhao, Tie-Yan Liu, ISMIR 2022.
  • [9] Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation, Botao Yu, Peiling Lu, Rui Wang, Wei Hu, Xu Tan, Wei Ye, Shikun Zhang, Tao Qin, Tie-Yan Liu, NeurIPS 2022.
  • [10] PopMAG: Pop Music Accompaniment Generation, Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu, ACM Multimedia 2020.
  • [11] HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis, Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu, arXiv 2020.
  • [12] CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval, Shangda Wu, Dingyao Yu, Xu Tan, Maosong Sun, ISMIR 2023, Best Student Paper Award.
  • [13] GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework, Ang Lv, Xu Tan, Peiling Lu, Wei Ye, Shikun Zhang, Jiang Bian, Rui Yan, arXiv 2023.
  • [14] MuseCoco: Generating Symbolic Music from Text, Peiling Lu, Xin Xu, Chenfei Kang, Botao Yu, Chengyi Xing, Xu Tan, Jiang Bian, arXiv 2023.
  • [15] MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models, Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian, EMNLP 2023 Demo.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

muzic's People

Contributors

1252187392 avatar actuy avatar btyu avatar cclauss avatar jzq2000 avatar lxueaa avatar meloform avatar mlzeng avatar peillu avatar peilnlu avatar sander-wood avatar stillkeeptry avatar tan-xu avatar tanujdhiman avatar tobyoup avatar trestad avatar uranusyu avatar xxupiano avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

muzic's Issues

telemelody: how can I get dict.trend.txt?

When inference using pretrained models, I got error about missing file:

FileNotFoundError: [Errno 2] No such file or directory: 'data-bin/template2melody_zh/dict.trend.txt'

I have dict.beat.txt and dict.lyric.txt copied from training dir, do I need to run training first to get the trend file?

Undefined name: tokenizer

$ flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics

./muzic/songmass/mass/xmasked_seq2seq.py:124:60: F821 undefined name 'tokenizer'
            Dictionary.add_file_to_dictionary(filename, d, tokenizer.tokenize_line, workers)
                                                           ^
1     F821 undefined name 'tokenizer'
1

[MusicBERT]: How to fill masked tokens in an input sequence after training?

Hello again,

I have fine-tuned MusicBERT on masked language modeling using a custom dataset. I have loaded the fine-tuned checkpoint using:

roberta = RobertaModel.from_pretrained( # MusicBERTModel.from_pretrained also works
    '.',
    checkpoint_file=sys.argv[1],
    data_name_or_path=sys.argv[2],
    user_dir='musicbert'
)

What I want to do is to give it an input sequence, mask one or more tokens before passing the input to the model and somehow predict them. Something like masked language modeling, but with control over which tokens I want to mask and predict.

What I cannot understand is what format the input sequence should be in order to be passed to the model, and how to make the model predict the masked tokens in the input.
I have tried to replicate it by looking at the fairseq's training code since I want to do something similar, but it's too complicated.

Thanks in advance.

How to get the ground truth file of the project SONGMASS?

Thanks for open sourcing!

When I want to test the pitch/duration similarity and melody distance,I don't know where these files are

LYRIC=lyric.gt # The lyric file of ground truth
MELODY=melody.gt # The melody file of ground truth
HYPOS=hypo.txt # The generated result in fairseq format
SONG_ID=song_id_test.txt # The song id file

I would be very grateful if someone would be happy to tell me how to find these or how to generate these files!

关于songmass使用旋律生成歌词的疑问

首先非常感谢各位能开源这么好的工具!
我尝试了一下songmass的旋律生成歌词的功能.没有找到可以直接给一段旋律,然后生成歌词的方案。我现在的操作流程是.
制作旋律的文件和歌词文件,然后把他们当做验证集,使用验证集的方式来出结果.
具体流程如下:
fairseq-preprocess \ --user-dir mass \ --task xmasked_seq2seq \ --source-lang lyric --target-lang melody \ --trainpref $para_data_dir/train --validpref $para_data_dir/t3 \ --destdir $save_dir \ --srcdict $para_data_dir/dict.lyric.txt \ --tgtdict $para_data_dir/dict.melody.txt bash infer_lyric.sh $data_dir mass $model
问题是:

  1. 请问有办法直接给一个旋律自动生成歌词么?
  2. 我使用测试集和验证集里的数据,推理出的结果表现都很完美,和参考的的完全一致,所以请问测试集和验证集都参与过训练是么?
  3. 我自己按照规则制作了一首歌曲,推理出的结果很差。
    我自己制作的歌词和旋律如下:

旋律:
65 129 65 129 [align] 65 129 72 129 [align] 72 129 67 130 [align] [sep] 67 129 [align] 67 129 67 129 [align] 67 129 65 129 [align] 64 129 65 129 [align] [sep] 65 129 65 129 [align] 65 129 65 129 [align] 65 129 [align] 65 129 65 129 [align] [sep] 65 129 [align] 65 129 [align] 65 129 [align] [sep] 72 130 [align] 65 129 [align] 65 129 [align] [sep]

歌词:
sky [align] is [align] raining [align] [sep] i [align] want [align] stay [align] alone [align] [sep] stay [align] with [align] my [align] mama [align] [sep] one [align] two [align] three [align] [sep] let [align] is [align] go [align] [sep]

[teleMelody] Inference: How to get the dictionary placed in the data-bin?

Hi, thanks for providing checkpoints of the models.
I'm trying to run the inference. I got stuck on the line:

data_name_or_path=f'data-bin/{lyric2beat_prefix}'

I put checkpoints in telemelody/inferrence like:

checkpoints
├── lyric2rhythm
│   └── checkpoint_best.pt
├── template2melody
│   └── checkpoint_best.pt
...

I'm not sure what file should be placed in data-bin/model_prefix.
I will be grateful for any help you can provide.

miss argument

when i bash generate.sh, the function " get_sentence_pinyin_finals() " input "raw_text", but this function has two parameters when it is defined

TypeError: get_sentence_pinyin_finals() missing 1 required positional argument: 'invalids_finals'

Cannot install dependencies: No matching distribution found for fairseq==0.10.2

Cannot install dependencies when I run pip install -r requirements.txt. The major error message is as followed:

...
ERROR: Could not find a version that satisfies the requirement fairseq==0.10.2 (from versions: 0.6.1, 0.6.2, 0.7.1, 0.7.2, 0.8.0, 0.9.0, 0.10.0, 0.10.1, 0.10.2)
ERROR: No matching distribution found for fairseq==0.10.2

My development environments:

  • python 3.9.7
  • pip 21.2.4
  • macos 11.5.2

[MusicBERT]: Need help understanding loop in preprocess.F method

Hello!

I'm trying to fine-tune the pretrained model using another dataset, but I'm stuck at the loop block below.
I understand the final format of output_str_list, but I simply cannot get a grasp on what this code does, so I was hoping you could provide me with an explanation.

output_str_list = []
sample_step = max(round(sample_len_max / sample_overlap_rate), 1)
for p in range(0 - random.randint(0, sample_len_max - 1), len(e), sample_step):
    L = max(p, 0)
    R = min(p + sample_len_max, len(e)) - 1
    bar_index_list = [e[i][0] for i in range(L, R + 1) if e[i][0] is not None]
    bar_index_min = 0
    bar_index_max = 0
    if len(bar_index_list) > 0:
        bar_index_min = min(bar_index_list)
        bar_index_max = max(bar_index_list)
    offset_lower_bound = -bar_index_min
    offset_upper_bound = bar_max - 1 - bar_index_max
    # to make bar index distribute in [0, bar_max)
    bar_index_offset = random.randint(
        offset_lower_bound, offset_upper_bound) if offset_lower_bound <= offset_upper_bound else offset_lower_bound
    e_segment = []
    for i in e[L: R + 1]:
        if i[0] is None or i[0] + bar_index_offset < bar_max:
            e_segment.append(i)
        else:
            break
    tokens_per_note = 8
    output_words = (['<s>'] * tokens_per_note) \
        + [('<{}-{}>'.format(j, k if j > 0 else k + bar_index_offset) if k is not None else '<unk>') for i in e_segment for j, k in enumerate(i)] \
        + (['</s>'] * (tokens_per_note - 1)
           )  # tokens_per_note - 1 for append_eos functionality of binarizer in fairseq
    output_str_list.append(' '.join(output_words))

Also, in gen_genre.py, why do we want to sample the train set multiple times? Why do we need output_str_list four times?

Thanks in advance!

[teleMelody] Inference (EN): pattern is empty list. thus, ValueError: min() arg is an empty sequence.

Hi, I run the inference/infer_en.py with the test data (data/en/test). Here is the output of each variables:

beats:

0 0 0 1 1 1 2 [sep] 1 2 2 0 1 1 2 [sep]

beats_label:

[[0], [0], [0], [1], [1], [1], [2], [2], [1], [2], [2], [0], [1], [1], [2]]

clean(syllables):

.............

tmp:

['.']

pattern:

[]

The following error occurred, Does anyone know how to fix it?

min_bar = min([i[0] for i in e])
ValueError: min() arg is an empty sequence.

--

Update

Maybe do not need clean() function when run infer_en.py ?

[PDAugment] What does the phoneme level alignment pickle file contain?

Hi PDAugment team, just a quick question about the pickle file generated during the phoneme level alignment stage. In the example on the README

{
    "174-168635-0000.wav" : [0, 12, 18, 20...],
    "174-168635-0001.wav" : [0, 12, 27, 35...],
    "174-168635-0002.wav" : [0, 13, 26, 33...],
    ...
}

what do the lists such as [0, 12, 18, 20, ...] represent? Do they represent the milliseconds for the split positions between adjacent phonemes?

[musicBERT] encoding_to_MIDI has a problem?

Hello, I am trying to use the wonderful octupleMIDI representation.
First, I get encoding by MIDI_to_encoding. Then, I restore it to midi by encoding_to_MIDI. But, there is a problem with the duration of some notes, like this.

I find the restored midi file will be right, if filtering out the notes whose duration is 0 in encoding_to_MIDI function.
Maybe there is a bug in miditoolkit(0.1.14, 0.1.15)?

image

This is my solution(in preprocess.py):
image

Number of epochs

Hi, thanks for this research! I'm trying to train MusicBERT using your code and it seems to work fine, but there doesn't seem to be any limit on the number of epochs. Will it keep training forever (until I stop it)? How many epochs did you use in your paper? Thanks!

HiFiSinger code release?

Hi, I was wondering if you were planning on releasing the code for HiFiSinger anytime soon? I would be very much interested in experimenting with it.

Thank you!

Dictionary Error for PDAugment while aligning

I've seen the previous issue that says we should manually align the phonemes via MFA alignment tool.
While when I try the official example with the pretrained model provided, I've got the following error:

dictionary phones: {'th', 'b', 'v', 'ch', 'sh', 'p', 'ey', 's', 'k', 'jh', 'ow', 'hh', 'ah', 'oy', 'd', 'uh', 'm', 'dh', 'ax', 'iy', 'ih', 'zh', 'ae', 'er', 'y', 'l', 'w', 'ng', 'aw', 'n', 'aa', 'g', 'ao', 'r', 't', 'eh', 'uw', 'f', 'ay', 'z'}
model phones: {'IY0', 'AO2', 'AY1', 'IY1', 'AE2', 'UH2', 'M', 'AA0', 'UW1', 'F', 'N', 'EH0', 'AO1', 'G', 'OY0', 'AE1', 'UW0', 'Z', 'ER2', 'CH', 'NG', 'AH2', 'ER1', 'JH', 'OW0', 'UH0', 'OY1', 'P', 'SH', 'EY2', 'DH', 'AH1', 'ZH', 'AY0', 'AW0', 'AE0', 'AA2', 'T', 'TH', 'EY0', 'UW2', 'V', 'AA1', 'OW2', 'S', 'EY1', 'IY2', 'R', 'AY2', 'EH1', 'AH0', 'EH2', 'IH2', 'L', 'K', 'UH1', 'AW1', 'AO0', 'AW2', 'D', 'OY2', 'W', 'B', 'Y', 'IH1', 'HH', 'OW1', 'IH0', 'ER0'}
There were phones in the dictionary that do not have acoustic models: aa, ae, ah, ao, aw, ax, ay, b, ch, d, dh, eh, er, ey, f, g, hh, ih, iy, jh, k, l, m, n, ng, ow, oy, p, r, s, sh, t, th, uh, uw, v, w, y, z, zh

Is there a way to convert between these phones? Or is there another model that takes dictionary phones as input? Thanks.

ps. It seems that 'ax' does not exist in the model phones.

Sharing models through the Hugging Face Hub

Hi Muzic team!

Would you be interested in sharing your pretrained models in the Hugging Face Hub? The Hub offers free hosting of over 25K models, and it would make your work more accessible and visible to the rest of the ML community. There's an existing Microsoft organization where your pretrained models could be.

Some of the benefits of sharing your models through the Hub would be:

  • wider reach of your work to the ecosystem
  • potentially interactive widgets to demo your work
  • repos provide useful metadata about their tasks, languages, metrics, etc that make them discoverable
  • versioning, commit history and diffs
  • multiple features from TensorBoard visualizations, PapersWithCode integration, and more

Creating the repos and adding new models should be a relatively straightforward process if you've used Git before. This is a step-by-step guide explaining the process in case you're interested. Please let us know if you would be interested and if you have any questions.

Happy to hear your thoughts,
Omar and the Hugging Face team

关于musicbert的requirement

请问能给出一个单独的关于musicbert运行环境吗,我的python版本在3.6时安装外面的requirement是出现了诸多错误,如下图所示
image

missing data folder in muzic/telemelody

Hello!

in Training -> Lyric-to-Rhythm, the first step is:

(1) Prepare lyric-to-rhythm dataset. (An example is available in directory data/example.)

However, it looks like the data folder is not included, so we have no examples to look at when attempting to replicate this work. Could your team either provide an example dataset or a specification on how to download and prep it?

How to encode pop music?

Nice to see the research of musicbert! And I have a question, the octupleMIDI encoding method mentioned seems only adapt to pure music. Is there any way to encode pop music with singer's voice? Thanks.

Is there accompany music in the Dali or Dsing30 dataset?

Hi!
I try the pdaugment in my accompanied mandarin song dataset, the ratio between training data and augmented data is 1.3:1, but the test accurracy reduces a little bit comparing to the setup without augmented data.

I guess is that the accompany influence the result?

thanks for replying!

Problems for inference stage

i success in data and train stages and meet some problems in inference.
image
I meet the similar problem in the train stage, and I use a single GPU to avoid this. However, it didn't work for inference stage.
My environment is Unbuntu 16.04, CUDA 10.2, Python 3.6.12 and others follow requiments.txt.
Happy to hear your reply

How did I can get the pickle?#PDAugment

image

Excuse, I didn't understand the list meaning, and could you explaint the 'split position'?
it represent the phone length? or the phone's mfcc mapping value?

Import PDAugment to espnet

Very cool work and thans for open sourcing!

I saw that in your arxiv paper on pdaugment, you said using espnet for modeling and evaluation. Do you consider adding a corresponding recipe to the ESPNet repo as well?

[MusicBERT] Conflicts in tokenizing OctupleMIDI string to indices

Hi!

With reference to musicbert/eval_genre.py and fairseq examples, it was found that there are two ways to tokenize and OctupleMIDI string into indices:

  • By using label_dict = RobertaModel.task.label_dictionary as hinted in musicbert/eval_genre.py
  • By using RobertaModel.encode as hinted in fairseq examples

image

However, the RobertaModel.encode seems to be totally disfunctional considering that it should work like the snips below:

image
image

Can you confirm what is the correct way to encode an OctupleMIDI string to token indices to feed forward to MusicBERT (RobertaModel)?

[MusicBERT] Conflicts in encoding OctupleMIDI tuples to OctupleMIDI string

Hi!

In order to encode a MIDI file to OctupleMIDI, I followed the script musicbert/preprocess.py.

The OctupleMIDI tuples were obtained by running the following snipped as e:
image

However, attempting to convert e to OctupleMIDI string can be done by two ways:

  • Usage of Lines 370-398 of musicbert/preprocess.py and considering output_str_list[0] as OctupleMIDI string.
  • Usage of encoding_to_str method in musicbert/preprocess.py in Line 429

But as seen in encircled text below there is some difference between these two in Measure field:
image

Which approach was used during the pretraining of models?
If I choose the wrong approach, the pretrained model would most likely give wrong embeddings.

Thanks in advance!

new_writer() missing 1 required positional argument: 'output_str_list'

I'm trying to preprocess LMD for the genre classification task using gen_genre.py, but I'm getting this error for every file: new_writer() missing 1 required positional argument: 'output_str_list'

Indeed, looking at preprocess.py, writer() is called with only one argument, output_str_list, but new_writer() has two arguments: file_name, output_str_list.

Is the fix to simply add the file_name argument in preprocess.py at the following places? Won't that break something else?

def writer(output_str_list):

writer(output_str_list)

Thanks!

musicBert SMALL base MODEL error

when i run: $ bash train_mask.sh lmd_full small with provided small model,
fairseq gets error:

-- Process 3 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/pyg/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/mmu_ssd/chenjunmin/apps/fairseq-0.10.0/fairseq/distributed_utils.py", line 270, in distributed_main
main(args, **kwargs)
File "/mmu_ssd/chenjunmin/apps/fairseq-0.10.0/fairseq_cli/train.py", line 114, in main
disable_iterator_cache=task.has_sharded_data("train"),
File "/mmu_ssd/chenjunmin/apps/fairseq-0.10.0/fairseq/checkpoint_utils.py", line 193, in load_checkpoint
reset_meters=reset_meters,
File "/mmu_ssd/chenjunmin/apps/fairseq-0.10.0/fairseq/trainer.py", line 279, in load_checkpoint
state = checkpoint_utils.load_checkpoint_to_cpu(filename)
File "/mmu_ssd/chenjunmin/apps/fairseq-0.10.0/fairseq/checkpoint_utils.py", line 232, in load_checkpoint_to_cpu
state = _upgrade_state_dict(state)
File "/mmu_ssd/chenjunmin/apps/fairseq-0.10.0/fairseq/checkpoint_utils.py", line 436, in _upgrade_state_dict
registry.set_defaults(state["args"], tasks.TASK_REGISTRY[state["args"].task])
AttributeError: 'NoneType' object has no attribute 'task'

popMAG code release

Hi, muzic team
Thanks a lot for sharing! I am interested in your popMAG implementation, do you have a plan to release the code for popMAG? And furthermore, a pretrained-model of popMAG for accompaniment generation?

[MusicBERT]: Could not infer model type from Namespace (eval_genre.py)

Hello!

I'm trying to run the evaluation script for the genre classification task using the command python -u eval_genre.py checkpoints/checkpoint_last_musicbert_small.pt topmagd_data_bin/x, and I'm getting the error below when running RobertaModel.from_pretrained:

Traceback (most recent call last):
  File "eval_genre.py", line 39, in <module>
    user_dir='musicbert'
  File "/home/aspil/muzic/musicbert/fairseq/fairseq/models/roberta/model.py", line 251, in from_pretrained
    **kwargs,
  File "/home/aspil/muzic/musicbert/fairseq/fairseq/hub_utils.py", line 75, in from_pretrained
    arg_overrides=kwargs,
  File "/home/aspil/muzic/musicbert/fairseq/fairseq/checkpoint_utils.py", line 353, in load_model_ensemble_and_task
    model = task.build_model(cfg.model)
  File "/home/aspil/muzic/musicbert/fairseq/fairseq/tasks/fairseq_task.py", line 567, in build_model
    model = models.build_model(args, self)
  File "/home/aspil/muzic/musicbert/fairseq/fairseq/models/__init__.py", line 93, in build_model
    + model_type
AssertionError: Could not infer model type from Namespace(_name='roberta_small', activation_dropout=0.0, activation_fn='gelu', adam_betas='(0.9,0.98)', adam_eps=1e-06, all_gather_list_size=16384, arch='roberta_small', attention_dropout=0.1, azureml_logging=False, batch_size=8, batch_size_valid=8, best_checkpoint_metric='loss', bf16=False, bpe='gpt2', broadcast_buffers=False, bucket_cap_mb=25, checkpoint_shard_count=1, checkpoint_suffix='_bar_roberta_small', clip_norm=0.0, cpu=False, criterion='masked_lm', curriculum=0, data='topmagd_data_bin/0/input0', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=8, distributed_wrapper='DDP', dropout=0.1, empty_cache_freq=0, encoder_attention_heads=8, encoder_embed_dim=512, encoder_ffn_embed_dim=2048, encoder_layerdrop=0, encoder_layers=4, encoder_layers_to_keep=None, end_learning_rate=0.0, eos=2, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, freq_weighted_replacement=False, gen_subset='test', heartbeat_timeout=-1, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, leave_unmasked_prob=0.1, load_checkpoint_heads=True, load_checkpoint_on_all_dp_ranks=False, localsgd_frequency=3, log_format='simple', log_interval=100, lr=[0.0005], lr_scheduler='polynomial_decay', mask_multiple_length=1, mask_prob=0.15, mask_stdev=0.0, mask_whole_words=False, max_epoch=0, max_positions=8192, max_tokens=None, max_tokens_valid=None, max_update=125000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, nprocs_per_node=8, num_shards=1, num_workers=1, optimizer='adam', optimizer_overrides='{}', pad=1, patience=-1, pipeline_balance=None, pipeline_checkpoint='never', pipeline_chunks=0, pipeline_decoder_balance=None, pipeline_decoder_devices=None, pipeline_devices=None, pipeline_encoder_balance=None, pipeline_encoder_devices=None, pipeline_model_parallel=False, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path=None, random_token_prob=0.1, required_batch_size_multiple=8, required_seq_len_multiple=1, reset_dataloader=False, reset_logging=True, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoints/checkpoint_last_bar_roberta_small.pt', sample_break_mode='complete', save_dir='checkpoints', save_interval=1, save_interval_updates=0, scoring='bleu', seed=1, sentence_avg=False, shard_id=0, shorten_data_split_list='', shorten_method='none', skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='LocalSGD', slowmo_momentum=None, spectral_norm_classification_head=False, stop_min_lr=-1.0, stop_time_hours=0, task='masked_lm', tensorboard_logdir=None, threshold_loss_scale=None, tokenizer=None, tokens_per_sample=8192, total_num_update='125000', tpu=False, train_subset='train', unk=3, untie_weights_roberta=False, update_freq=[4], use_bmuf=False, use_old_adam=False, user_dir='musicbert', valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, wandb_project=None, warmup_updates=25000, weight_decay=0.01, zero_sharding='none'). Available models: dict_keys(['transformer_lm', 'wav2vec', 'wav2vec2', 'wav2vec_ctc', 'wav2vec_seq2seq']) Requested model type: roberta_small

Environment

Thanks in advance!

Edit: When running the above command with the base checkpoint, I get the following:

Traceback (most recent call last):
  File "eval_genre.py", line 39, in <module>
    user_dir='musicbert'
  File "/home/aspil/anaconda3/envs/musicbert_01/lib/python3.6/site-packages/fairseq/models/roberta/model.py", line 251, in from_pretrained
    **kwargs,
  File "/home/aspil/anaconda3/envs/musicbert_01/lib/python3.6/site-packages/fairseq/hub_utils.py", line 75, in from_pretrained
    arg_overrides=kwargs,
  File "/home/aspil/anaconda3/envs/musicbert_01/lib/python3.6/site-packages/fairseq/checkpoint_utils.py", line 355, in load_model_ensemble_and_task
    model.load_state_dict(state["model"], strict=strict, model_cfg=cfg.model)
  File "/home/aspil/anaconda3/envs/musicbert_01/lib/python3.6/site-packages/fairseq/models/fairseq_model.py", line 115, in load_state_dict
    return super().load_state_dict(new_state_dict, strict)
  File "/home/aspil/anaconda3/envs/musicbert_01/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for RobertaModel:
        Unexpected key(s) in state_dict: "encoder.sentence_encoder.downsampling.0.weight", "encoder.sentence_encoder.downsampling.0.bias", "encoder.sentence_encoder.upsampling.0.weight", "encoder.sentence_encoder.upsampling.0.bias".

I don't know if I messed up something, I'd appreciate any help!

Normalizing OctupleMIDI between [-1., 1.]

Hi Muzic Team,

OctupleMIDI is incredibly cool way to encode MIDI content. I am working on the LMD data set and am able to convert MIDI->OctupleMIDI->MIDI. However in-order to train my model, I need to ensure the values are normalized between [-1., 1.].

I looked at your paper and inferred that, the following are the max-values for the different columns in OctupleMIDI:

    [Time Sig, Tempo, Bar, Position, Instrument, Pitch, Duration, Velocity]
    [254, [16-256], [256], [256], [128], 128, 128, 128]

However on analyzing the data I have, I see the max values of OctupleMIDI encoded files are as follows:

    [Time Sig, Tempo, Bar, Position, Instrument, Pitch, Duration, Velocity]
    [9280., 127., 128., 255., 127., 31., 157., 48.]

I am not sure if this is a because of corrupt MIDI files or there is something else amiss. Essentially I am trying to find the max range of these values so I can use that to normalize the encoding.

Is this the right way to do it? Any help would be much appreciated!

Thanks!

[telemelody] Training Process Stuck: Train template-to-melody model

Hi, I tried training on my own machine, however, it seems stuck on the training process (1.2 (3) Train template-to-melody model).
截屏2022-02-18 下午3 36 22

Does this relate to my machine settings? It seems that the lmd_matched dataset is set up correctly.

P.S. This issue is also related to #31, thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.