ethman / slakh-utils Goto Github PK
View Code? Open in Web Editor NEWUtilities for interfacing with Slakh2100
License: MIT License
Utilities for interfacing with Slakh2100
License: MIT License
After using Slakh for multi-instrument automatic music transcription, I've found several stems with different kinds of errors. At the time of this writing (2021-06-07), 22 stems are found. Most of the errors are only relevant for music transcription.
The errors are classified in the following classes:
white-noise
Something must have gone wrong with the audio rendering—the audio for the stem only consists of white noise
wrong-pitch
The pitch of the label and audio are not the same
wrong-octave
The octave of the label and audio are not the same
missing-audio
Not all the notes in the label are rendered
short-labels
Some of the notes in the MIDI file parsed with PrettyMIDI are shorter than the rendered audio
long-labels
Some of the notes in the MIDI file parsed with PrettyMIDI are longer than the rendered audio
A list of the errors is added below as well as in this GitHub repository (which might be updated if more errors are found). This list does not contain the systematic error of the highest octave in the label not being present in the audio as described in issue #18
{
"Track00262": {
"S01": "short-labels"
},
"Track00357": {
"S03": "white-noise"
},
"Track00377": {
"S07": "white-noise"
},
"Track00385": {
"S00": "white-noise"
},
"Track00398": {
"S00": "white-noise"
},
"Track00400": {
"S00": "white-noise"
},
"Track00404": {
"S03": "long-labels"
},
"Track00496": {
"S01": "wrong-pitch"
},
"Track00629": {
"S01": "white-noise"
},
"Track00633": {
"S01": "white-noise"
},
"Track00737": {
"S01": "long-labels"
},
"Track00749": {
"S01": "white-noise"
},
"Track00893": {
"S01": "long-labels"
},
"Track01629": {
"S00": "white-noise"
},
"Track01876": {
"S01": "missing-audio"
},
"Track01908": {
"S05": "missing-audio"
},
"Track01918": {
"S10": "wrong-pitch"
},
"Track01929": {
"S04": "wrong-octave"
},
"Track01931": {
"S01": "wrong-pitch"
},
"Track01993": {
"S01": "missing-audio"
},
"Track01937": {
"S03": "wrong-pitch"
},
"Track02024": {
"S13": "missing-audio"
}
}
Right now resampling and conversion to/from flac both create new directories and make a copy during their respective operations. This consumes a lot of disk space. It would be nice to have an option to overwrite the existing data with the resampled or converted data.
Sorry, but i am new to this topic. Can i ask if I would want to use MUSDB 18 and Slakh dataset together for my model, does that mean I have to manually relabel all the tracks for Slakh? This is because MUSDB 18 labelled their tracks as follows.
0 - The mixture,
1 - The drums,
2 - The bass,
3 - The rest of the accompaniment,
4 - The vocals
However, for Slakh since there are many instruments, it is labelled differently as shown in this link https://github.com/ethman/slakh-utils/blob/master/midi_inst_values/general_midi_inst_0based.txt
Since Slakh has a large number of tracks, I would like to ask if there is any efficient way to go about relabeling the tracks? Or there is already a full combine set available (i couldn't find it). If there isn't, is it possible to provide an instruction guide to teach how I could actually relabel the tracks in Slakh to be like MUSDB18? Do I have to download the stem creator tool used in MUSDB18 by Native Instruments ( https://www.stems-music.com/stem-creator-tool/ ) and rearrange the tracks into the 4 different types of categories?
Commands to replicate the issue:
conda create --name slakh3 python=3
conda activate slakh3
pip install -r requirements.txt
python flac_converter.py -i $INPUT_DIR -o $OUTPUT_DIR -c False
Traceback (most recent call last):
File "conversion/flac_converter.py", line 207, in
args.end, args.num_threads, args.verbose)
File "conversion/flac_converter.py", line 111, in _apply_ffmpeg
pool.map(_apply_convert_dir, track_directories)
File "/home/jamie/anaconda3/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/jamie/anaconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
File "/home/jamie/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/jamie/anaconda3/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "conversion/flac_converter.py", line 109, in _apply_convert_dir
ffmpeg_func, verbose=verbose)
File "conversion/flac_converter.py", line 85, in _convert_folder
ffmpeg_func(in_mix_path, out_track_dir)
File "conversion/flac_converter.py", line 45, in _flac_to_wav
ffmpeg.input(input_path).output(output_path).run_async(overwrite_output=not verbose)
File "/home/jamie/anaconda3/lib/python3.7/site-packages/ffmpeg/_run.py", line 285, in run_async
args, stdin=stdin_stream, stdout=stdout_stream, stderr=stderr_stream
File "/home/jamie/anaconda3/lib/python3.7/subprocess.py", line 800, in init
restore_signals, start_new_session)
File "/home/jamie/anaconda3/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg': 'ffmpeg'
The packages I have are as follows:
pip list
Package Version
alabaster 0.7.12
anaconda-client 1.7.2
anaconda-navigator 1.9.12
anaconda-project 0.8.3
argh 0.26.2
asn1crypto 1.3.0
astroid 2.3.3
astropy 4.0
atomicwrites 1.3.0
attrs 19.3.0
audioread 2.1.8
autopep8 1.4.4
Babel 2.8.0
backcall 0.1.0
backports.functools-lru-cache 1.6.1
backports.shutil-get-terminal-size 1.0.0
backports.tempfile 1.0
backports.weakref 1.0.post1
beautifulsoup4 4.8.2
bitarray 1.2.1
bkcharts 0.2
bleach 3.1.0
bokeh 1.4.0
boto 2.49.0
Bottleneck 1.3.2
certifi 2019.11.28
cffi 1.14.0
chardet 3.0.4
Click 7.0
cloudpickle 1.3.0
clyent 1.2.2
colorama 0.4.3
conda 4.8.3
conda-build 3.18.11
conda-package-handling 1.6.0
conda-verify 3.4.2
contextlib2 0.6.0.post1
cryptography 2.8
cycler 0.10.0
Cython 0.29.15
cytoolz 0.10.1
dask 2.11.0
decorator 4.4.1
defusedxml 0.6.0
diff-match-patch 20181111
distributed 2.11.0
docutils 0.16
entrypoints 0.3
et-xmlfile 1.0.1
fastcache 1.1.0
ffmpeg-python 0.2.0
filelock 3.0.12
flake8 3.7.9
Flask 1.1.1
fsspec 0.6.2
future 0.18.2
gevent 1.4.0
glob2 0.7
gmpy2 2.0.8
greenlet 0.4.15
h5py 2.10.0
HeapDict 1.0.1
html5lib 1.0.1
hypothesis 5.5.4
idna 2.8
imageio 2.6.1
imagesize 1.2.0
importlib-metadata 1.5.0
intervaltree 3.0.2
ipykernel 5.1.4
ipython 7.12.0
ipython-genutils 0.2.0
ipywidgets 7.5.1
isort 4.3.21
itsdangerous 1.1.0
jdcal 1.4.1
jedi 0.14.1
jeepney 0.4.2
Jinja2 2.11.1
joblib 0.14.1
json5 0.9.1
jsonschema 3.2.0
jupyter 1.0.0
jupyter-client 5.3.4
jupyter-console 6.1.0
jupyter-core 4.6.1
jupyterlab 1.2.6
jupyterlab-server 1.0.6
keyring 21.1.0
kiwisolver 1.1.0
lazy-object-proxy 1.4.3
libarchive-c 2.8
librosa 0.7.2
lief 0.9.0
llvmlite 0.31.0
locket 0.2.0
loguru 0.4.1
lxml 4.5.0
MarkupSafe 1.1.1
matplotlib 3.1.3
mccabe 0.6.1
mistune 0.8.4
mkl-fft 1.0.15
mkl-random 1.1.0
mkl-service 2.3.0
mock 4.0.1
more-itertools 8.2.0
mpmath 1.1.0
msgpack 0.6.1
multipledispatch 0.6.0
navigator-updater 0.2.1
nbconvert 5.6.1
nbformat 5.0.4
networkx 2.4
nltk 3.4.5
nose 1.3.7
notebook 6.0.3
numba 0.48.0
numexpr 2.7.1
numpy 1.18.1
numpydoc 0.9.2
olefile 0.46
openpyxl 3.0.3
packaging 20.1
pandas 1.0.1
pandocfilters 1.4.2
parso 0.5.2
partd 1.1.0
path 13.1.0
pathlib2 2.3.5
pathtools 0.1.2
patsy 0.5.1
pep8 1.7.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 7.0.0
pip 20.0.2
pkginfo 1.5.0.1
pluggy 0.13.1
ply 3.11
prometheus-client 0.7.1
prompt-toolkit 3.0.3
psutil 5.6.7
ptyprocess 0.6.0
py 1.8.1
pycodestyle 2.5.0
pycosat 0.6.3
pycparser 2.19
pycrypto 2.6.1
pycurl 7.43.0.5
pydocstyle 4.0.1
pyflakes 2.1.1
Pygments 2.5.2
pylint 2.4.4
pyodbc 4.0.0-unsupported
pyOpenSSL 19.1.0
pyparsing 2.4.6
pyrsistent 0.15.7
PySocks 1.7.1
pytest 5.3.5
pytest-arraydiff 0.3
pytest-astropy 0.8.0
pytest-astropy-header 0.1.2
pytest-doctestplus 0.5.0
pytest-openfiles 0.4.0
pytest-remotedata 0.3.2
python-dateutil 2.8.1
python-jsonrpc-server 0.3.4
python-language-server 0.31.7
pytz 2019.3
PyWavelets 1.1.1
pyxdg 0.26
PyYAML 5.3
pyzmq 18.1.1
QDarkStyle 2.8
QtAwesome 0.6.1
qtconsole 4.6.0
QtPy 1.9.0
requests 2.22.0
resampy 0.2.2
rope 0.16.0
Rtree 0.9.3
ruamel-yaml 0.15.87
scikit-image 0.16.2
scikit-learn 0.22.1
scipy 1.4.1
seaborn 0.10.0
SecretStorage 3.1.2
Send2Trash 1.5.0
setuptools 45.2.0.post20200210
simplegeneric 0.8.1
singledispatch 3.4.0.3
six 1.14.0
snowballstemmer 2.0.0
sortedcollections 1.1.2
sortedcontainers 2.1.0
SoundFile 0.10.3.post1
soupsieve 1.9.5
Sphinx 2.4.0
sphinxcontrib-applehelp 1.0.1
sphinxcontrib-devhelp 1.0.1
sphinxcontrib-htmlhelp 1.0.2
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.2
sphinxcontrib-serializinghtml 1.1.3
sphinxcontrib-websupport 1.2.0
spyder 4.0.1
spyder-kernels 1.8.1
SQLAlchemy 1.3.13
statsmodels 0.11.0
stempeg 0.1.8
sympy 1.5.1
tables 3.6.1
tblib 1.6.0
terminado 0.8.3
testpath 0.4.4
toolz 0.10.0
tornado 6.0.3
tqdm 4.42.1
traitlets 4.3.3
ujson 1.35
unicodecsv 0.14.1
urllib3 1.25.8
watchdog 0.10.2
wcwidth 0.1.8
webencodings 0.5.1
Werkzeug 1.0.0
wheel 0.34.2
widgetsnbextension 3.5.1
wrapt 1.11.2
wurlitzer 2.0.0
xlrd 1.2.0
XlsxWriter 1.2.7
xlwt 1.3.0
xmltodict 0.12.0
yapf 0.28.0
zict 1.0.0
zipp 2.2.0
I also tried pip uninstall ffmpeg-python && pip install ffmpeg-python.
I've also tried installing and uninstalling ffmpeg.
Not sure what to do from here. Thanks for any help!
The rendered S12 and S13 audio files of Track01500 seems to be just a single tone that lasts throughout the entire duration.
But the midi files for S12 and S13 shows changing pitches and pauses.
Is there anything wrong with these two stems?
Questions to answer:
Scrub & update MIDI data accordingly.
I've already mentioned this in this comment, but I'm opening it again here to give it more publicity for other researchers.
After looking at the bass labels and listening to the stems, I've noticed that the deepest note in the bass stems is B1 (one octave above the deepest note on a 5-string electric bass and seven notes above the deepest note on a 4-string electric bass).
Likewise, the highest octave in the labels are not rendered in the audio since this is one octave above the highest note on a 4/5-string 24-fret electric bass (note G5).
It would be great if this information was added to the slakh.com website and even better if the bass stems was re-rendered!
Hello,
I have installed the requirements and when executing the code I get this error. I don't understand why, can someone please help me?
Thank you
(slakh) C:\Users\Charlie>python C:/Users/Charlie/Desktop/TFM/slakh-utils-master/conversion/flac_converter.py -i E:/DATASETS/Slakh2100/slakh2100_flac/test -o E:/DATASETS/Slakh2100_wav/slakh2100_flac/test -c False
Traceback (most recent call last):
File "C:/Users/Charlie/Desktop/TFM/slakh-utils-master/conversion/flac_converter.py", line 206, in
_apply_ffmpeg(args.input_dir, args.output_dir, args.compress, args.start,
File "C:/Users/Charlie/Desktop/TFM/slakh-utils-master/conversion/flac_converter.py", line 111, in _apply_ffmpeg
pool.map(_apply_convert_dir, track_directories)
File "C:\Users\Charlie\anaconda3\envs\slakh\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\Charlie\anaconda3\envs\slakh\lib\multiprocessing\pool.py", line 771, in get
raise self._value
File "C:\Users\Charlie\anaconda3\envs\slakh\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\Charlie\anaconda3\envs\slakh\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "C:/Users/Charlie/Desktop/TFM/slakh-utils-master/conversion/flac_converter.py", line 108, in _apply_convert_dir
_convert_folder(in_track_dir, mix_name, output_dir,
File "C:/Users/Charlie/Desktop/TFM/slakh-utils-master/conversion/flac_converter.py", line 85, in _convert_folder
ffmpeg_func(in_mix_path, out_track_dir)
File "C:/Users/Charlie/Desktop/TFM/slakh-utils-master/conversion/flac_converter.py", line 45, in _flac_to_wav
ffmpeg.input(input_path).output(output_path).run_async(overwrite_output=not verbose)
File "C:\Users\Charlie\anaconda3\envs\slakh\lib\site-packages\ffmpeg_run.py", line 284, in run_async
return subprocess.Popen(
File "C:\Users\Charlie\anaconda3\envs\slakh\lib\subprocess.py", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\Charlie\anaconda3\envs\slakh\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] El sistema no puede encontrar el archivo especificado
I would first like to say that this project is a great initiative! I know that this might be a bit much to ask for, but I would like to see the full dataset provided as 16 kHz audio files.
Given that 128GB and 256GB are common disk sizes a reduced dataset would more feasible to work on this dataset. In many applications, such as automatic music transcription that I work on, the audio will be downsampled to 16 kHz anyways.
Hello,
I recently downloaded the Slakh dataset for use in the context of my thesis on musical source separation.
When trying to convert the data from .flac to .wav using the provided script (flac_converter.py
in conversion/
) that I call with
python conversion/flac_converter.py -i ../Data/slakh2100_flac/train/ -o ../Data/slakh2100_wav/train/ -c False
, at some point I keep getting the following error message repeatedly:
Input #0, flac, from '../../Data/slakh2100_flac/train/Track00061/stems/S00.flac':
Metadata:
ENCODER : Lavf57.83.100
Duration: 00:04:20.17, start: 0.000000, bitrate: 129 kb/s
Stream #0:0: Audio: flac, 44100 Hz, mono, s16
Stream mapping:
Stream #0:0 -> #0:0 (flac (native) -> pcm_s16le (native))
Error while opening decoder for input stream #0:0 : Resource temporarily unavailable
with different stems each time.
After this a few huge chunks of tracks are skipped, and the script goes on to convert a few more tracks before stopping completely. I also get the following somewhere near the end:
Input #0, flac, from '../../Data/slakh2100_flac/train/Track01142/stems/S06.flac':
Metadata:
ENCODER : Lavf57.83.100
Duration: 00:06:12.01, start: 0.000000, bitrate: 24 kb/s
Stream #0:0: Audio: flac, 44100 Hz, mono, s16
Input #0, flac, from '../../Data/slakh2100_flac/train/Track01142/mix.flac':
Metadata:
ENCODER : Lavf57.83.100
Duration: 00:06:12.01, start: 0.000000, bitrate: 329 kb/s
Stream #0:0: Audio: flac, 44100 Hz, mono, s16
Stream mapping:
Stream #0:0 -> #0:0 (flac (native) -> pcm_s16le (native))
Error while opening decoder for input stream #0:0 : Resource temporarily unavailable
Stream mapping: time=00:00:56.63 bitrate= 703.6kbits/s speed=1.05x
Stream #0:0 -> #0:0 (flac (native) -> pcm_s16le (native))
Error while opening decoder for input stream #0:0 : Resource temporarily unavailable
Traceback (most recent call last):
File "flac_converter.py", line 207, in <module>
args.end, args.num_threads, args.verbose)
File "flac_converter.py", line 111, in _apply_ffmpeg
pool.map(_apply_convert_dir, track_directories)
File "/home/mthevoz/miniconda3/envs/MSS/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/mthevoz/miniconda3/envs/MSS/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
File "/home/mthevoz/miniconda3/envs/MSS/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/mthevoz/miniconda3/envs/MSS/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "flac_converter.py", line 109, in _apply_convert_dir
ffmpeg_func, verbose=verbose)
File "flac_converter.py", line 90, in _convert_folder
ffmpeg_func(src_path, out_stems_dir, verbose=verbose)
File "flac_converter.py", line 45, in _flac_to_wav
ffmpeg.input(input_path).output(output_path).run_async(overwrite_output=not verbose)
File "/home/mthevoz/miniconda3/envs/MSS/lib/python3.7/site-packages/ffmpeg/_run.py", line 285, in run_async
args, stdin=stdin_stream, stdout=stdout_stream, stderr=stderr_stream
File "/home/mthevoz/miniconda3/envs/MSS/lib/python3.7/subprocess.py", line 775, in __init__
restore_signals, start_new_session)
File "/home/mthevoz/miniconda3/envs/MSS/lib/python3.7/subprocess.py", line 1453, in _execute_child
restore_signals, start_new_session, preexec_fn)
BlockingIOError: [Errno 11] Resource temporarily unavailable
In the end my terminal becomes unresponsive and I end up with a bunch of tracks that seem to be properly converted but with big chunks missing in the middle and in the end. Any idea what's going on here?
Update: I fixed this issues and did a pull request.
I am trying to use the submixes function on the babyslakh dataset, but there are few bugs.
First thing is that arguments option '-submix-definition-file', '-s'
and '-src-dir', '-s'
have the same short name so that cause option string conflict. And from the 124 line:
if args.root_dir is None and args.src_dir is None:
raise ValueError('Must provide one of (root_dir, src_dir).')
elif args.root_dir is not None and args.src_dir is not None:
raise ValueError('Must provide only one of (root_dir, src_dir).')
the attribute name should be input_dir
instead of root_dir
Besides that there are still some bugs which prevent this code from working.
Did I use this code wrong or this function is still half-way finished?
I believe it should be possible to convert using only SoundFile.
I can't seem to find the resplit_slakh.py script (mentioned in the readme) anywhere? Also can't find any alternative branches.
Also tried python split_slakh.py -s /home/carlnys/data/slakh/slakh2100_flac/ -n redux.json
but raises KeyError: moved
There seems to be an issue in the file submixes.py
: The -s
flag is used both for submix-definition-file
and src-dir
, which raises the following error when running the script:
Traceback (most recent call last):
File "submixes/submixes.py", line 119, in <module>
help='Directory of a single track to create a submix for.')
File "/home/mthevoz/miniconda3/envs/MSS/lib/python3.7/argparse.py", line 1367, in add_argument
return self._add_action(action)
File "/home/mthevoz/miniconda3/envs/MSS/lib/python3.7/argparse.py", line 1730, in _add_action
self._optionals._add_action(action)
File "/home/mthevoz/miniconda3/envs/MSS/lib/python3.7/argparse.py", line 1571, in _add_action
action = super(_ArgumentGroup, self)._add_action(action)
File "/home/mthevoz/miniconda3/envs/MSS/lib/python3.7/argparse.py", line 1381, in _add_action
self._check_conflict(action)
File "/home/mthevoz/miniconda3/envs/MSS/lib/python3.7/argparse.py", line 1520, in _check_conflict
conflict_handler(action, confl_optionals)
File "/home/mthevoz/miniconda3/envs/MSS/lib/python3.7/argparse.py", line 1529, in _handle_conflict_error
raise ArgumentError(action, message % conflict_string)
argparse.ArgumentError: argument -src-dir/-s: conflicting option string: -s
Changing one of the flags to something else fixes this.
Also, another issue:
File "submixes/submixes.py", line 124, in <module>
if args.root_dir is None and args.src_dir is None:
AttributeError: 'Namespace' object has no attribute 'root_dir'
I'm guessing args.root_dir
there and in the next few lines is supposed to refer to args.input_dir
as defined in line 116
? There seems to be a problem with the variables' names used in the parser here.
Google has disallowed download for the link given by the google form.
How to get access to the dataset now ?
By default pretty_midi does not produce piano rolls for drum tracks. See the issue and work around here:
I am trying to do source separation with slakh2100, but during data processing, I realized that some tracks have S00 stem missing in the dataset.
train/Track00446
train/Track00487
train/Track00590
train/Track01009
validation/Track01672
validation/Track01740
validation/Track01794
I thought I might have downloaded a corrupted dataset, so re-downloaded the whole dataset. But those S00 stems are still missing.
Hi, I'm trying to train a transcriber using slakh, but I found that there are some pitch which relies on too low or too high range( {pitch}<21 or {pitch}>108).
For example, in track01170, there is a note with pitch = 0
I guess they would be ignored when you synthesize them, but I'm not sure about exact standard (i.e. different instrument could ignore different pitch range).
Can I get more details about it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.