mdeff / fma Goto Github PK

View Code? Open in Web Editor NEW

2.2K 57.0 428.0 4.08 MB

FMA: A Dataset For Music Analysis

Home Page: https://arxiv.org/abs/1612.01840

License: MIT License

Jupyter Notebook 77.37% Makefile 0.85% Python 21.38% Shell 0.41%

dataset music-analysis music-information-retrieval deep-learning open-data open-science reproducible-research

fma's People

Contributors

Stargazers

Watchers

Forkers

brahmaslee chagge benjamesbabala fage2016 slience2014 rogerxujiang wesky93 minok00 cheolho wnstlr rubythonode oppa3109 jdc08161063 wanjinchang allensmile stevenlol think-station keunwoochoi namghiwook dhanush-ai1990 mnrmja007 hariom-yadaw yunfeigao elenita1221 lyk125 okchip jgabriellima 7lc claudiopinheiro singhimanshu06 toshiemon18 lvaleriu oliver-hanna erickimme katjhuang leadingin agangzz avorio pankev-in mcdallas chapp060 willcampbell433 wooheet donghee214 mohaimenhasan terry07 peter0749 alexisperrier raymondhehe mirshahriar nishusharma1608 qin-peng shailja-thakur juanrequeijo hengshaochen piterskiy ml-lab youngjin-hyun kiran-raja shomronjacob singhsmriti aakashkerawat projectrecommend jasperkirton yiyichen haonanli jpatec oarriaga spmohanty hfloresr gmolina81 jrjames83 icedgarr mohitbindal10 airob runngezhang eunji0116 alisonwhitaker udaykeith isparki dolado ari-vedant-jain imaginationagents tjeckleburg hmc-cs-gwomark ravik020 batermj yimuuu ustcwangzi woshiwuxiaofei taksau pierdipi scallionpancakes ai3dvision stevexyz raghavrajaram habib-baig cherish24 hadryan ripingit

fma's Issues

Python 2.7 support

 File "utils.py", line 304
    self.X = np.empty((self.batch_size, *loader.shape))
                                        ^
SyntaxError: invalid syntax

with python2.7 and linux. Does the code support python 2.7?

module 'utils' has no attribute 'load'

Hi, I am going some practice.I was confused about this problem, and I already installed this module. Can you help me? Thanls!
tracks = utils.load('tracks.csv')

What are the eight genres in the FMA_Small Dataset?

What are the eight genres in the FMA_Small Dataset?Thanks.

i have tried different ways to implement this codes but every time i found a new error
can you help me how can i use these codes
i had installed python 3.5 and all of packages in requirements after that i run the creation and i found this error :
C:\Users\l3lackwood\Downloads\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\dotenv\main.py:24: UserWarning: Not loading - it doesn't exist.
warnings.warn("Not loading %s - it doesn't exist." % dotenv_path)
Traceback (most recent call last):
File "F:\farideh\python\WinPython-32bit-3.5.3.1Qt5\notebooks\genre\fma-master\creation.py", line 232, in
if sys.argv[1] == 'metadata':
IndexError: list index out of range

what should i do ?

Selecting Subset of Multiple Genres

Hi! I'm trying to figure out how i can obtain a subset of tracks using a list of genres. I picked a couple of genres. using a list like ["genre1", "genre2", ...] i want to slice the multiindex tracks so i only have the metadata for the tracks of those genres.

With tracks i mean the result you get when loading the tracks.csv.

This way i can feed tracks['track', 'genres_all'] too the fit transform/LabelBinarizer but now i only have the tracks with the genres i picked.

Kind regards,
Dylan.

definition of favorites, listens and interest in track features

My team are conducting an academic research project using your dataset, and we were wondering if you can help us clarify what each of the three columns means

track_favorites
track_listens
track_interest

Specifically, we are looking to understand how these columns are generated and if they can be a good measure of popularity, or if any other columns should be used instead.

Thank you!

`utils.load` crashing for `tracks.csv`

I was trying to run the usage.ipynb and the second cell crashed with the following error:

KeyError: ('track', 'genres_top')

I took a look inside the code and the csv file. For utils.load in the case of tracks.csv, it appears that you only need to change the tuple ('track, 'genres_top') for ('track, 'genre_top')in the list. That is remove an 's'.

I did the previous but when running again the code now I get this error:

<ipython-input-2-aa99f4d2677d> in <module>()
      3 
      4 # Load metadata and features.
----> 5 tracks = utils.load('tracks.csv')

~/Desktop/Music_Project/fma/utils.py in load(filepath)
    203         for column in COLUMNS:
    204             print(column)
--> 205             tracks[column] = tracks[column].map(ast.literal_eval)
.
.
.
ValueError: malformed node or string: <_ast.Name object at 0x125eed2e8>```

Corrupted files in FMA Large

I didn't double check, but I couldn't open files with these indices on linux/ffmpeg/librosa. Just wanted to share so that others would get some hints.

Tempo metadata?

Hello. Where can I find tempo information for the songs in FMA? I wasn't able to find any in the metadata.

Malformed node error when using utils.load in Usage.ipynb

Hello, I am just going over the usage example but I am unable to load the track metadata using utils.py

The first error I had was a bad key error for column ('track', 'genres_top'), but I was able to fix that by noticing that the tracks.csv column name is actually 'genre_top' (no s). After fixing that I still have an issue with ast.literal eval. This is the error I am getting from the notebook:

ValueError Traceback (most recent call last)
in ()
3
4 # Load metadata and features.
----> 5 tracks = utils.load('tracks.csv')
6 genres = utils.load('genres.csv')
7 features = utils.load('features.csv')

~/projects/fma-stft/fma/utils.py in load(filepath)
202 for column in COLUMNS:
203 print("Column: {}".format(column))
--> 204 tracks[column] = tracks[column].map(ast.literal_eval)
205
206 COLUMNS = [('track', 'date_created'), ('track', 'date_recorded'),

~/projects/fmaenv/lib/python3.5/site-packages/pandas/core/series.py in map(self, arg, na_action)
2311 else:
2312 # arg is a function
-> 2313 new_values = map_f(values, arg)
2314
2315 return self._constructor(new_values,

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

/usr/lib/python3.5/ast.py in literal_eval(node_or_string)
82 return left - right
83 raise ValueError('malformed node or string: ' + repr(node))
---> 84 return _convert(node_or_string)
85
86

/usr/lib/python3.5/ast.py in _convert(node)
81 else:
82 return left - right
---> 83 raise ValueError('malformed node or string: ' + repr(node))
84 return _convert(node_or_string)
85

ValueError: malformed node or string: <_ast.BinOp object at 0x7f25f0a53208>

The error occurs when processing column ('track', 'genre_top') in ' tracks[column] = tracks[column].map(ast.literal_eval)' , the other columns work normally. I downloaded the fma_metadata.zip and fma_small.zip from the provided links, and ensured that the sha1hash's were correct.

*I've just tried using python 3.6.0 using the suggested method in the readme, but I still have the same issue.

Can't verify SSL certificate for small dataset

When I try to download the small dataset (haven't tried with the others) with a script I get a SSL certification verification error.

The error can be reproduced in this minimal form:

import urllib.request
    
with urllib.request.urlopen('https://os.unil.cloud.switch.ch/fma/fma_small.zip') as response:
    pass

My traceback is:

---------------------------------------------------------------------------
SSLError                                  Traceback (most recent call last)
c:\users\amarafioti\appdata\local\programs\python\python36\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
   1317                 h.request(req.get_method(), req.selector, req.data, headers,
-> 1318                           encode_chunked=req.has_header('Transfer-encoding'))
   1319             except OSError as err: # timeout error

c:\users\amarafioti\appdata\local\programs\python\python36\lib\http\client.py in request(self, method, url, body, headers, encode_chunked)
   1238         """Send a complete request to the server."""
-> 1239         self._send_request(method, url, body, headers, encode_chunked)
   1240

c:\users\amarafioti\appdata\local\programs\python\python36\lib\http\client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1284             body = _encode(body, 'body')
-> 1285         self.endheaders(body, encode_chunked=encode_chunked)
   1286

c:\users\amarafioti\appdata\local\programs\python\python36\lib\http\client.py in endheaders(self, message_body, encode_chunked)
   1233             raise CannotSendHeader()
-> 1234         self._send_output(message_body, encode_chunked=encode_chunked)
   1235

c:\users\amarafioti\appdata\local\programs\python\python36\lib\http\client.py in _send_output(self, message_body, encode_chunked)
   1025         del self._buffer[:]
-> 1026         self.send(msg)
   1027

c:\users\amarafioti\appdata\local\programs\python\python36\lib\http\client.py in send(self, data)
    963             if self.auto_open:
--> 964                 self.connect()
    965             else:

c:\users\amarafioti\appdata\local\programs\python\python36\lib\http\client.py in connect(self)
   1399             self.sock = self._context.wrap_socket(self.sock,
-> 1400                                                   server_hostname=server_hostname)
   1401             if not self._context.check_hostname and self._check_hostname:

c:\users\amarafioti\appdata\local\programs\python\python36\lib\ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
    400                          server_hostname=server_hostname,
--> 401                          _context=self, _session=session)
    402

c:\users\amarafioti\appdata\local\programs\python\python36\lib\ssl.py in __init__(self, sock, keyfile, certfile, server_side, cert_reqs, ssl_version, ca_certs, do_handshake_on_connect, family, type, proto, fileno, suppress_ragged_eofs, npn_protocols, ciphers, server_hostname, _context, _session)
    807                         raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
--> 808                     self.do_handshake()
    809

c:\users\amarafioti\appdata\local\programs\python\python36\lib\ssl.py in do_handshake(self, block)
   1060                 self.settimeout(None)
-> 1061             self._sslobj.do_handshake()
   1062         finally:

c:\users\amarafioti\appdata\local\programs\python\python36\lib\ssl.py in do_handshake(self)
    682         """Start the SSL/TLS handshake."""
--> 683         self._sslobj.do_handshake()
    684         if self.context.check_hostname:

SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:748)

During handling of the above exception, another exception occurred:

URLError                                  Traceback (most recent call last)
<ipython-input-10-9d04e6e06fb8> in <module>()
----> 1 with urllib.request.urlopen('https://os.unil.cloud.switch.ch/fma/fma_small.zip') as response:
      2     pass

c:\users\amarafioti\appdata\local\programs\python\python36\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    221     else:
    222         opener = _opener
--> 223     return opener.open(url, data, timeout)
    224
    225 def install_opener(opener):

c:\users\amarafioti\appdata\local\programs\python\python36\lib\urllib\request.py in open(self, fullurl, data, timeout)
    524             req = meth(req)
    525
--> 526         response = self._open(req, data)
    527
    528         # post-process response

c:\users\amarafioti\appdata\local\programs\python\python36\lib\urllib\request.py in _open(self, req, data)
    542         protocol = req.type
    543         result = self._call_chain(self.handle_open, protocol, protocol +
--> 544                                   '_open', req)
    545         if result:
    546             return result

c:\users\amarafioti\appdata\local\programs\python\python36\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
    502         for handler in handlers:
    503             func = getattr(handler, meth_name)
--> 504             result = func(*args)
    505             if result is not None:
    506                 return result

c:\users\amarafioti\appdata\local\programs\python\python36\lib\urllib\request.py in https_open(self, req)
   1359         def https_open(self, req):
   1360             return self.do_open(http.client.HTTPSConnection, req,
-> 1361                 context=self._context, check_hostname=self._check_hostname)
   1362
   1363         https_request = AbstractHTTPHandler.do_request_

c:\users\amarafioti\appdata\local\programs\python\python36\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
   1318                           encode_chunked=req.has_header('Transfer-encoding'))
   1319             except OSError as err: # timeout error
-> 1320                 raise URLError(err)
   1321             r = h.getresponse()
   1322         except:

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:748)>

I found a workaround passing context = ssl.SSLContext(ssl.PROTOCOL_TLSv1) as an argument to the urlopen method but I thought this is something you may want to be aware of.

Musical key

Hi, I can't seem to find the musical key of the tracks, is it a feature that exists?

erroneous ID3 tag info

I'm not sure if this relates to #4, but I've found that at least sox (on debian!) tries to parse out file duration using the reported bit-rate. Unfortunately for me, the reported bitrate is way wrong for at least ≈90 tracks (of the 100k+), and probably wrong for another couple hundred ... these particularly bad tracks claim to have bitrates in excess of "100M", which sox (at least) parses as bits per second. I'd point out that stereo 16bit wav is 1.4Mbps.

The list of suspicious file IDs is here, if anyone wants to double-check / confirm ... the extension is txt, but it's JSON formatted, key point to sox-reported bitrate.

More fortunately, removing all the ID3 tags fixes the issue. I'd propose perhaps exporting all ID3 tags to a static dump over the collection (per #4), and then removing all the ID3 tags to sanitize the collection.

Error loading tracks; Discrepancy between column "genre/s_top" in util.py, tracks.csv and usage.ipynb

In line 201 in utils.py, one of the columns being called is
('track', 'genres_top')
but shouldn't it be
('track', 'genre_top')
based on tracks.csv?

However when I make that change, I get the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-5-eecca7133c46> in <module>()
----> 1 tracks = utils.load('fma_metadata/tracks.csv')
      2 genres = utils.load('fma_metadata/genres.csv')
      3 features = utils.load('fma_metadata/features.csv')
      4 echonest = utils.load('fma_metadata/echonest.csv')
      5 

~\OneDrive\Documents\GitHub\fma\utils.py in load(filepath)
    201                    ('track', 'genre_top')]
    202         for column in COLUMNS:
--> 203             tracks[column] = tracks[column].map(ast.literal_eval)
    204 
    205         COLUMNS = [('track', 'date_created'), ('track', 'date_recorded'),

C:\Anaconda3\lib\site-packages\pandas\core\series.py in map(self, arg, na_action)
   2052                                      index=self.index).__finalize__(self)
   2053         else:
-> 2054             mapped = map_f(values, arg)
   2055             return self._constructor(mapped,
   2056                                      index=self.index).__finalize__(self)

pandas\src\inference.pyx in pandas.lib.map_infer (pandas\lib.c:62578)()

C:\Anaconda3\lib\ast.py in literal_eval(node_or_string)
     82                 return left - right
     83         raise ValueError('malformed node or string: ' + repr(node))
---> 84     return _convert(node_or_string)
     85 
     86 

C:\Anaconda3\lib\ast.py in _convert(node)
     81             else:
     82                 return left - right
---> 83         raise ValueError('malformed node or string: ' + repr(node))
     84     return _convert(node_or_string)
     85 

ValueError: malformed node or string: <_ast.BinOp object at 0x00000253FB0C02B0>

I can load in the features, echonest, and genres with no errors.

How to find the title metadata of a particular file in fma_large?

Hello I am trying to browse through the archive and having a hard time to understand the metadata.zip.

For example if I want to get the title of file 020/020001.mp3 in the fma_large.zip, how can I locate it in the tracks.csv?

Error while loading tracks.csv

KeyError Traceback (most recent call last)
in ()
3
4 # Load metadata and features.
----> 5 tracks = utils.load('tracks.csv')
6 genres = utils.load('genres.csv')
7 features = utils.load('features.csv')

~\Desktop\ML\DeepAudioClassification-master - Copy\utils.py in load(filepath)
201 ('track', 'genres_top')]
202 for column in COLUMNS:
--> 203 tracks[column] = tracks[column].map(ast.literal_eval)
204
205 COLUMNS = [('track', 'date_created'), ('track', 'date_recorded'),

~\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key)
2055 if isinstance(i, slice):
2056 return self[i]
-> 2057 else:
2058 label = self.index[i]
2059 if isinstance(label, Index):

~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_multilevel(self, key)
2099 # a 0-len ndarray. This is effectively catching
2100 # a numpy error (as numpy should really raise)
-> 2101 values = self._data.iget(i)
2102
2103 if index_len and not len(values):

~\Anaconda3\lib\site-packages\pandas\indexes\multi.py in get_loc(self, key, method)

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)()

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)()

pandas\src\hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)()

pandas\src\hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)()

KeyError: ('track', 'genres_top')

Error while loading csv files
Does anyone know fix?

Files (much) shorter than 30s in fma-small

Hi, there are 6 files that are much shorter than 30s:

fma_small/098/098565.mp3 --> 1.6s
fma_small/098/098567.mp3 --> 0.5s
fma_small/098/098569.mp3 --> 1.5s
fma_small/099/099134.mp3 --> 0s
fma_small/108/108925.mp3 --> 0s
fma_small/133/133297.mp3 --> 0s

, in case it's not a known issue.

Known issues (and next release)

Below are issues affecting the rc1 data release that cannot be fixed without a data update. As updating is disruptive (it'll break code and make results non-comparable), it should be done sparingly, e.g., to fix a fatal flaw or many small ones discovered over time.

zip decompression fails because of unsupported bzip2 compression (#5)
- workaround (master): note in README to try with 7zip (5700859)
- fix (next): zip with deflate (instead of bzip2) (#5) or zstd (#32)
excerpts shorter than 30s and erroneous audio length metadata (#4, #8, #36, #44)
- workaround (master): small subset's list, medium subset's list (#8)
- fix (next): metadata from mp3 not API, ensure 30s (8077afe, 00d5b71, 840b337)
erroneous ID3 tags (#27)
- workaround (master): list (#27)
- fix (next): dump ID3 tags with technical metadata and remove from mp3
exact duplicate tracks (#23)
- workaround (master): list the 937 duplicates
- fix (next): remove them (try other methods and detect near duplicates)

Workarounds are explained in more details in the wiki.

wrong organized data

There are couples of wrong organized data in tracks.csv such as some texts, row 62, 64, 65 etc, appeared in the column track listens.

Image data of album covers

Hi, the FMA dataset looks amazing, thank you so much for sharing this!

I'm planning a research project that will involve multimodal models trained on album covers as well as the audio signal of tracks from the respective albums. Does FMA include cover art?

If I understand correctly from the FMA paper, cover art is not yet included. Quoting from the discussion section:

Cover images for tracks, albums, and artists are another public asset which may be of interest.

From what I can tell on the freemusicarchive.org website, albums seem to usually (always?) come with a cover image. I think I might be able to automatically download these based on the album names in the FMA dataset. However, I'm wondering if there is a better way. I would appreciate any recommendation.

Possibly corrupted files in fma_small

I apologize if I missed a step or did not do something on my part. Thank you for the data and all the examples.

The training using cnn after pre-processing the audio files starts off but as soon as some files are fetched, the training stops with the below error:
Unknown: CalledProcessError: Command '['ffmpeg', '-i', 'path-to-dataset\\fma_small\\099\\**099134**.mp3', '-f', 's16le', '-acodec', 'pcm_s16le', '-ac', '1', '-']' return ed non-zero exit status 1.

Looking at this, I checked the file 099134 and my default audio player could not play it and also the metadata(in File explorer) for that file seems to be missing as shown below

Error while loading tracks.csv

Hello,

I am getting an error while trying to load tracks.csv.

Any ideas?

Any way of downloading small dataset with untrimmed audio files?

I would like to download some untrimmed tracks, is there any way besides downloading the full dataset? Unfortunately I don't have 879GiB available :)

A dataset the size of fma_small but with untrimmed tracks would suffice.

Thanks!

How can I split the fma-small dataset with genres?

I can not find a way to split the source audio zip to 8 genres with 1000 tracks.I can't find a file to help me do this.Would you mind helping me? Please...Thanks

update technical metadata

As title says. Probably it's not valid?

List on Google Dataset Search

https://datasetsearch.research.google.com

How?

Lossless music format

Hi,

Thanks very much for this great dataset.
I am working on a research project that requires lossless music files as input. I wonder if there is any way for us (or me) to get a .wav or .flac lossless version of your dataset by any chance.

Issues while unzipping fma_small.zip

When I unzip the file on my MacBook I get only around 1945 tracks instead of the 8000 mentioned.
The track "048367" is causing an issue while unzipping and it stops over there.
I've used the default application, Keka and The Unarchiver but all three are resulting in the same issue.
I tried unzipping with 7zip on a windows OS but I'm still getting the same 1945 tracks since it stops at track "048367"

Anybody else facing the same problem?

ValueError: Got an unexpected argument: categories

Hi, I've referred to the Usage section in the README as well as #9 and #10. I've checked out rc1 because it's appropriate for the version of fma_metadata.zip and fma_small.zip that I checked out, and also I've set my environment variables.

Nevertheless, running the line

    tracks = utils.load('tracks.csv')

in either the usage.ipynb file or my own very simple Python script will produce a ValueError about categories:

Traceback (most recent call last):
  File "proc_fma.py", line 3, in <module>
    tracks = utils.load('fma_metadata/tracks.csv')
  File "/media/datadrive/datasets/fma/utils.py", line 213, in load
    'category', categories=SUBSETS, ordered=True)
  File "/home/shawley/anaconda3/envs/panotti/lib/python3.6/site-packages/pandas/core/generic.py", line 5883, in astype
    dtype=dtype, copy=copy, errors=errors, **kwargs
  File "/home/shawley/anaconda3/envs/panotti/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 581, in astype
    return self.apply("astype", dtype=dtype, **kwargs)
  File "/home/shawley/anaconda3/envs/panotti/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 438, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/shawley/anaconda3/envs/panotti/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 557, in astype
    return self._astype(dtype, copy=copy, errors=errors, values=values, **kwargs)
  File "/home/shawley/anaconda3/envs/panotti/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 598, in _astype
    "Got an unexpected argument: {}".format(deprecated_arg)
ValueError: Got an unexpected argument: categories

I haven't seen this error reported in any of the other issues. Can anyone help, e.g. @mdeff ?
Thanks!

Only 7989 track data in tracks.csv for subset small?

Hi, im trying to use the fma dataset for cnn training.

I'm currently attempting to retrieve metadata for the fma_small subset (the track_id and genre_top) for the 8000, however there seems to be 11 rows of missing data. Perhaps my csv file is corrupt or there is an error.

Appreciate your help!

Torrent download and Zstd compression

I repackaged dataset with zstd and uploaded it to academictorrents.com.

http://academictorrents.com/details/dba20c45d4d6fa6453a4e99d2f8a4817893cfb94

Also it is temporarily availible as a direct link here

http://fma.mine.toys/fma/checksums
http://fma.mine.toys/fma/fma_metadata.tar.zst
http://fma.mine.toys/fma/fma_small.tar.zst
http://fma.mine.toys/fma/fma_medium.tar.zst
http://fma.mine.toys/fma/fma_large.tar.zst
http://fma.mine.toys/fma/fma_full.tar.zst

Zstd is way faster than Zip to unpack. If you don't have "tar" with zstd support you can install if from conda.
conda install tar zstd
How to unpack
tar -xaf fma_small.tar.zst
How to pack
tar -caf fma_small.tar.zst fma_small/
Alternatively you can install binary and use zstd as an external command for tar.

sudo apt install zstd
tar -I zstd -xvf fma_small.tar.zst
tar -I zstd -cf fma_small.tar.zst fma_small/

If that is desirable and appropriate then I can make a PR with changes to README.

zip: compress with deflate instead of bzip2

I tried downloading the main metadata file to look at the underlying CSVs: https://os.unil.cloud.switch.ch/fma/fma_metadata.zip

I'm getting a rejection on both mac and windows when I try to unzip this. Am I missing something?

Download only subset of zip file

Is it possible to download only a specific subset of the FMA_full zip file?

'Categories' Type error

Hello, I am trying to work with this dataset for a personal project, but there seems to be an issue with utils.load. I keep getting this error even with the new code that uses CategoricalDtype:

I went through the other closed issues and tried a new git clone, but it still does not work? Is there a way around this?

Edit: Was able to resolve. For some reason, even if I deleted the code where the old version was used, the error would still fall on the deleted line. I just had to copy-paste everything I needed into a new script and it worked.

Unable to download fma_full.zip

Overview

The download of fma_full.zip stops before completion.

Steps to Reproduce

# From the README
curl -O https://os.unil.cloud.switch.ch/fma/fma_full.zip

which, after a while, gives something like this:

transfer closed with n bytes remaining to read

Any advice/help would be greatly appreciated :)

Are genres sorted by importance?

Hi, is the order in genres list for each track sorted by significance, i.e., is it random? Why it would be great to have that information is because you can say "This song is mostly jazz with elements of experimental rock and a bit of reggae", even though that task is mostly too fuzzy to give strong claims, but still relying just a bit on this information seems better than having a collection of tags in random order.

Corrupted Files?

Hello, I was trying to convert the small dataset to .wav using pydub and some files gave me errors trying to import. I tried them with librosa and they also failed. The files are as listed:

fma_small/099/099134.mp3
fma_small/108/108925.mp3
fma_small/133/133297.mp3

Please let me know if I did something wrong or if you are also getting the error. Thanks.

.

Nevermind, it was my mistake :)

host on cloud computing provider

A suggestion - I notice there are a few open issues about outdated data version, so I presume the hosting of this data is inconvenient to update. As such i might be worth hosting the data somewhere else.

according to the FAQ, Microsoft Research Open Data will host data sets up to 250gb. Amazon ad probably google offer similar schemes.

Duplicate mp3s in `fma_full`?

After some digging, I'm reasonably confident that there are a fair number of files that have at least one exact duplicate in the fma_full zipfile. This came up when I was trouble-shooting some weird behavior, and noticed that the ID3 metadata associated with a track didn't match the CSV file of track metadata, but did match a different row.

Metadata matching is at best a wicked pain, so instead I took at look at which files match based on a hash of the bytestream:

import hashlib, glob, os
from joblib import Parallel, delayed

def hash_one(fname):
    hsh = hashlib.sha384()
    hsh.update(open(fname, 'rb').read())
    return hsh.digest().hex()

pool = Parallel(n_jobs=-2, verbose=20)
dfx = delayed(hash_one)
fnames = glob.glob('fma_full/*/*mp3')
fhashes = pool(dfx(fn) for fn in fnames)  # takes approx 20min w/64 cores :oD

groups = dict()
for fh, fn in zip(fhashes, fnames):
    if fh not in groups:
        groups[fh] = []
    groups[fh].append(os.path.splitext(os.path.basename(fn))[0])

This produces 105637 unique file hashes from 106574, with 105042 pointing to a single file.

I've reproduced this twice decompressing the zipfile, so I'm pretty sure it's nothing I did. That said, I also downloaded the dataset a long time ago (last summer, maybe?), and I'm curious if it's been updated at all?

I'm curious what might have caused this, and wonder if the 105k tracks without duplicates map to accurate metadata in the raw_tracks.csv file? I haven't had a chance to check the ID3 tag coverage yet, but that should be an easy thing to look into.

for what it's worth, I also haven't looked at the smaller partitions, so I'm not sure if / how this might affect other uses of the dataset. Will follow up later if / when I learn more.

improper data distribution in fma-medium as well as need for seperate metadata

i downloaded the fma_medium dataset it has 161 folders with almost 1000 30's tracks each but there in no information as what genres they belong as description for fma_medium states it should have 25000 tracks for 16 unbalanced genres and if it a subset of fma_large the metadata file for genres has more than 161 genres for me to able to match

can not run deep learning baseline.

When I run the baseline, I bump into this problems. Can anyone help me with this?

Dimensionality: (59953,)
Epoch 1/2
 1664/19922 [=>............................] - ETA: 2559s - loss: 15.5950 - acc: 0.0325
Process Process-7:
Traceback (most recent call last):
  File "/anaconda3/envs/deeplearning3.5/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap
    self.run()
  File "/anaconda3/envs/deeplearning3.5/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/anaconda3/envs/deeplearning3.5/lib/python3.5/site-packages/keras/engine/training.py", line 429, in data_generator_task
    generator_output = next(self._generator)
  File "/Users/dc/Downloads/fma-rc1/utils.py", line 327, in __next__
    self.X[i] = self.loader.load(get_audio_path(audio_dir, tid))
ValueError: could not broadcast input array from shape (59943) into shape (59953)
 1696/19922 [=>............................] - ETA: 2552s - loss: 15.5954 - acc: 0.0324
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-373babcd7ac0> in <module>()
     16 model.compile(optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
     17 
---> 18 model.fit_generator(SampleLoader(train, batch_size=32), train.size, nb_epoch=2, **params)
     19 loss = model.evaluate_generator(SampleLoader(val, batch_size=32), val.size, **params)
     20 loss = model.evaluate_generator(SampleLoader(test, batch_size=32), test.size, **params)

/anaconda3/envs/deeplearning3.5/lib/python3.5/site-packages/keras/models.py in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe, initial_epoch, **kwargs)
    933                                         nb_worker=nb_worker,
    934                                         pickle_safe=pickle_safe,
--> 935                                         initial_epoch=initial_epoch)
    936 
    937     def evaluate_generator(self, generator, val_samples,

/anaconda3/envs/deeplearning3.5/lib/python3.5/site-packages/keras/engine/training.py in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe, initial_epoch)
   1530                                          '(x, y, sample_weight) '
   1531                                          'or (x, y). Found: ' +
-> 1532                                          str(generator_output))
   1533                     if len(generator_output) == 2:
   1534                         x, y = generator_output

ValueError: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None

utils.py incorrect

It seems file utils.py is incorrect as the usage notebook example does not work correctly with the current util.spy

Unable to import fma package

Error called "FileNotFoundError : no module called pymongo.dbref"
Subsequently editing the init.py adding "bson.dbref" or pymongo.database, for some reason renders every subsequent import useless. Can you please address this?

KeyError when load tracks in baseline ipynb


tracks = utils.load(r'data\fma_metadata\tracks.csv')
features = utils.load(r'data\fma_metadata\features.csv')
echonest = utils.load(r'data\fma_metadata\echonest.csv')

np.testing.assert_array_equal(features.index, tracks.index)
assert echonest.index.isin(tracks.index).all()

tracks.shape, features.shape, echonest.shape

This is the second block in the baseline.ipynb,
I get this KeyError:

<ipython-input-9-ed98c1f7f0d0> in <module>()
      1 AUDIO_DIR = os.environ.get('AUDIO_DIR')
      2 
----> 3 tracks = utils.load(r'data\fma_metadata\tracks.csv')
      4 features = utils.load(r'data\fma_metadata\features.csv')
      5 echonest = utils.load(r'data\fma_metadata\echonest.csv')

G:\www\fma\utils.py in load(filepath)
    201                    ('track', 'genres_top')]
    202         for column in COLUMNS:
--> 203             tracks[column] = tracks[column].map(ast.literal_eval)
    204 
    205         COLUMNS = [('track', 'date_created'), ('track', 'date_recorded'),

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   1960             return self._getitem_frame(key)
   1961         elif is_mi_columns:
-> 1962             return self._getitem_multilevel(key)
   1963         else:
   1964             return self._getitem_column(key)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_multilevel(self, key)
   2004 
   2005     def _getitem_multilevel(self, key):
-> 2006         loc = self.columns.get_loc(key)
   2007         if isinstance(loc, (slice, Series, np.ndarray, Index)):
   2008             new_columns = self.columns[loc]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py in get_loc(self, key, method)
   1998             key = _values_from_object(key)
   1999             key = tuple(map(_maybe_str_to_time_stamp, key, self.levels))
-> 2000             return self._engine.get_loc(key)
   2001 
   2002         # -- partial selection or non-unique index

pandas\_libs\index.pyx in pandas._libs.index.MultiIndexObjectEngine.get_loc (pandas\_libs\index.c:12722)()

pandas\_libs\index.pyx in pandas._libs.index.MultiIndexObjectEngine.get_loc (pandas\_libs\index.c:12643)()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)()

KeyError: ('track', 'genres_top')```

Genre prediction

Hi,

I am trying to use FMA for my project work and it seems almost half of the genre information in the data is NaN. How do you recommend we deal with these?

Thanks