mdeff / fma Goto Github PK
View Code? Open in Web Editor NEWFMA: A Dataset For Music Analysis
Home Page: https://arxiv.org/abs/1612.01840
License: MIT License
FMA: A Dataset For Music Analysis
Home Page: https://arxiv.org/abs/1612.01840
License: MIT License
File "utils.py", line 304
self.X = np.empty((self.batch_size, *loader.shape))
^
SyntaxError: invalid syntax
with python2.7 and linux. Does the code support python 2.7?
Hi, I am going some practice.I was confused about this problem, and I already installed this module. Can you help me? Thanls!
tracks = utils.load('tracks.csv')
What are the eight genres in the FMA_Small Dataset?Thanks.
i have tried different ways to implement this codes but every time i found a new error
can you help me how can i use these codes
i had installed python 3.5 and all of packages in requirements after that i run the creation and i found this error :
C:\Users\l3lackwood\Downloads\WinPython-64bit-3.5.3.1Qt5\python-3.5.3.amd64\lib\site-packages\dotenv\main.py:24: UserWarning: Not loading - it doesn't exist.
warnings.warn("Not loading %s - it doesn't exist." % dotenv_path)
Traceback (most recent call last):
File "F:\farideh\python\WinPython-32bit-3.5.3.1Qt5\notebooks\genre\fma-master\creation.py", line 232, in
if sys.argv[1] == 'metadata':
IndexError: list index out of range
what should i do ?
Hi! I'm trying to figure out how i can obtain a subset of tracks using a list of genres. I picked a couple of genres. using a list like ["genre1", "genre2", ...] i want to slice the multiindex tracks so i only have the metadata for the tracks of those genres.
With tracks i mean the result you get when loading the tracks.csv.
This way i can feed tracks['track', 'genres_all'] too the fit transform/LabelBinarizer but now i only have the tracks with the genres i picked.
Kind regards,
Dylan.
My team are conducting an academic research project using your dataset, and we were wondering if you can help us clarify what each of the three columns means
Specifically, we are looking to understand how these columns are generated and if they can be a good measure of popularity, or if any other columns should be used instead.
Thank you!
I was trying to run the usage.ipynb
and the second cell crashed with the following error:
KeyError: ('track', 'genres_top')
I took a look inside the code and the csv
file. For utils.load
in the case of tracks.csv
, it appears that you only need to change the tuple ('track, 'genres_top')
for ('track, 'genre_top')
in the list. That is remove an 's'.
I did the previous but when running again the code now I get this error:
<ipython-input-2-aa99f4d2677d> in <module>()
3
4 # Load metadata and features.
----> 5 tracks = utils.load('tracks.csv')
~/Desktop/Music_Project/fma/utils.py in load(filepath)
203 for column in COLUMNS:
204 print(column)
--> 205 tracks[column] = tracks[column].map(ast.literal_eval)
.
.
.
ValueError: malformed node or string: <_ast.Name object at 0x125eed2e8>```
I didn't double check, but I couldn't open files with these indices on linux/ffmpeg/librosa. Just wanted to share so that others would get some hints.
2624,
3284,
8669,
10116,
11583,
12838,
13529,
14116,
14180,
20814,
22554,
23429,
23430,
23431,
25173,
25174,
25175,
25176,
25180,
29345,
29346,
29352,
29356,
33411,
33413,
33414,
33417,
33418,
33419,
33425,
35725,
39363,
41745,
42986,
43753,
50594,
50782,
53668,
54569,
54582,
61480,
61822,
63422,
63997,
72656,
72980,
73510,
80553,
82699,
84503,
84504,
84522,
84524,
86656,
86659,
86661,
86664,
87057,
90244,
90245,
90247,
90248,
90250,
90252,
90253,
90442,
90445,
91206,
92479,
94052,
94234,
95253,
96203,
96207,
96210,
98105,
98562,
101265,
101272,
101275,
102241,
102243,
102247,
102249,
102289,
106409,
106412,
106415,
106628,
108920,
109266,
110236,
115610,
117441,
127928,
129207,
129800,
130328,
130748,
130751,
131545,
133641,
133647,
134887,
140449,
140450,
140451,
140452,
140453,
140454,
140455,
140456,
140457,
140458,
140459,
140460,
140461,
140462,
140463,
140464,
140465,
140466,
140467,
140468,
140469,
140470,
140471,
140472,
142614,
144518,
144619,
145056,
146056,
147419,
147424,
148786,
148787,
148788,
148789,
148790,
148791,
148792,
148793,
148794,
148795,
151920,
155051,
Hello. Where can I find tempo information for the songs in FMA? I wasn't able to find any in the metadata.
Hello, I am just going over the usage example but I am unable to load the track metadata using utils.py
The first error I had was a bad key error for column ('track', 'genres_top'), but I was able to fix that by noticing that the tracks.csv column name is actually 'genre_top' (no s). After fixing that I still have an issue with ast.literal eval. This is the error I am getting from the notebook:
ValueError Traceback (most recent call last)
in ()
3
4 # Load metadata and features.
----> 5 tracks = utils.load('tracks.csv')
6 genres = utils.load('genres.csv')
7 features = utils.load('features.csv')~/projects/fma-stft/fma/utils.py in load(filepath)
202 for column in COLUMNS:
203 print("Column: {}".format(column))
--> 204 tracks[column] = tracks[column].map(ast.literal_eval)
205
206 COLUMNS = [('track', 'date_created'), ('track', 'date_recorded'),~/projects/fmaenv/lib/python3.5/site-packages/pandas/core/series.py in map(self, arg, na_action)
2311 else:
2312 # arg is a function
-> 2313 new_values = map_f(values, arg)
2314
2315 return self._constructor(new_values,pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()
/usr/lib/python3.5/ast.py in literal_eval(node_or_string)
82 return left - right
83 raise ValueError('malformed node or string: ' + repr(node))
---> 84 return _convert(node_or_string)
85
86/usr/lib/python3.5/ast.py in _convert(node)
81 else:
82 return left - right
---> 83 raise ValueError('malformed node or string: ' + repr(node))
84 return _convert(node_or_string)
85ValueError: malformed node or string: <_ast.BinOp object at 0x7f25f0a53208>
The error occurs when processing column ('track', 'genre_top') in ' tracks[column] = tracks[column].map(ast.literal_eval)' , the other columns work normally. I downloaded the fma_metadata.zip and fma_small.zip from the provided links, and ensured that the sha1hash's were correct.
*I've just tried using python 3.6.0 using the suggested method in the readme, but I still have the same issue.
When I try to download the small dataset (haven't tried with the others) with a script I get a SSL certification verification error.
The error can be reproduced in this minimal form:
import urllib.request
with urllib.request.urlopen('https://os.unil.cloud.switch.ch/fma/fma_small.zip') as response:
pass
My traceback is:
---------------------------------------------------------------------------
SSLError Traceback (most recent call last)
c:\users\amarafioti\appdata\local\programs\python\python36\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
1317 h.request(req.get_method(), req.selector, req.data, headers,
-> 1318 encode_chunked=req.has_header('Transfer-encoding'))
1319 except OSError as err: # timeout error
c:\users\amarafioti\appdata\local\programs\python\python36\lib\http\client.py in request(self, method, url, body, headers, encode_chunked)
1238 """Send a complete request to the server."""
-> 1239 self._send_request(method, url, body, headers, encode_chunked)
1240
c:\users\amarafioti\appdata\local\programs\python\python36\lib\http\client.py in _send_request(self, method, url, body, headers, encode_chunked)
1284 body = _encode(body, 'body')
-> 1285 self.endheaders(body, encode_chunked=encode_chunked)
1286
c:\users\amarafioti\appdata\local\programs\python\python36\lib\http\client.py in endheaders(self, message_body, encode_chunked)
1233 raise CannotSendHeader()
-> 1234 self._send_output(message_body, encode_chunked=encode_chunked)
1235
c:\users\amarafioti\appdata\local\programs\python\python36\lib\http\client.py in _send_output(self, message_body, encode_chunked)
1025 del self._buffer[:]
-> 1026 self.send(msg)
1027
c:\users\amarafioti\appdata\local\programs\python\python36\lib\http\client.py in send(self, data)
963 if self.auto_open:
--> 964 self.connect()
965 else:
c:\users\amarafioti\appdata\local\programs\python\python36\lib\http\client.py in connect(self)
1399 self.sock = self._context.wrap_socket(self.sock,
-> 1400 server_hostname=server_hostname)
1401 if not self._context.check_hostname and self._check_hostname:
c:\users\amarafioti\appdata\local\programs\python\python36\lib\ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
400 server_hostname=server_hostname,
--> 401 _context=self, _session=session)
402
c:\users\amarafioti\appdata\local\programs\python\python36\lib\ssl.py in __init__(self, sock, keyfile, certfile, server_side, cert_reqs, ssl_version, ca_certs, do_handshake_on_connect, family, type, proto, fileno, suppress_ragged_eofs, npn_protocols, ciphers, server_hostname, _context, _session)
807 raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
--> 808 self.do_handshake()
809
c:\users\amarafioti\appdata\local\programs\python\python36\lib\ssl.py in do_handshake(self, block)
1060 self.settimeout(None)
-> 1061 self._sslobj.do_handshake()
1062 finally:
c:\users\amarafioti\appdata\local\programs\python\python36\lib\ssl.py in do_handshake(self)
682 """Start the SSL/TLS handshake."""
--> 683 self._sslobj.do_handshake()
684 if self.context.check_hostname:
SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:748)
During handling of the above exception, another exception occurred:
URLError Traceback (most recent call last)
<ipython-input-10-9d04e6e06fb8> in <module>()
----> 1 with urllib.request.urlopen('https://os.unil.cloud.switch.ch/fma/fma_small.zip') as response:
2 pass
c:\users\amarafioti\appdata\local\programs\python\python36\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
221 else:
222 opener = _opener
--> 223 return opener.open(url, data, timeout)
224
225 def install_opener(opener):
c:\users\amarafioti\appdata\local\programs\python\python36\lib\urllib\request.py in open(self, fullurl, data, timeout)
524 req = meth(req)
525
--> 526 response = self._open(req, data)
527
528 # post-process response
c:\users\amarafioti\appdata\local\programs\python\python36\lib\urllib\request.py in _open(self, req, data)
542 protocol = req.type
543 result = self._call_chain(self.handle_open, protocol, protocol +
--> 544 '_open', req)
545 if result:
546 return result
c:\users\amarafioti\appdata\local\programs\python\python36\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
502 for handler in handlers:
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
506 return result
c:\users\amarafioti\appdata\local\programs\python\python36\lib\urllib\request.py in https_open(self, req)
1359 def https_open(self, req):
1360 return self.do_open(http.client.HTTPSConnection, req,
-> 1361 context=self._context, check_hostname=self._check_hostname)
1362
1363 https_request = AbstractHTTPHandler.do_request_
c:\users\amarafioti\appdata\local\programs\python\python36\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
1318 encode_chunked=req.has_header('Transfer-encoding'))
1319 except OSError as err: # timeout error
-> 1320 raise URLError(err)
1321 r = h.getresponse()
1322 except:
URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:748)>
I found a workaround passing context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
as an argument to the urlopen method but I thought this is something you may want to be aware of.
Hi, I can't seem to find the musical key of the tracks, is it a feature that exists?
I'm not sure if this relates to #4, but I've found that at least sox (on debian!) tries to parse out file duration using the reported bit-rate. Unfortunately for me, the reported bitrate is way wrong for at least โ90 tracks (of the 100k+), and probably wrong for another couple hundred ... these particularly bad tracks claim to have bitrates in excess of "100M", which sox (at least) parses as bits per second. I'd point out that stereo 16bit wav is 1.4Mbps.
The list of suspicious file IDs is here, if anyone wants to double-check / confirm ... the extension is txt, but it's JSON formatted, key point to sox-reported bitrate.
More fortunately, removing all the ID3 tags fixes the issue. I'd propose perhaps exporting all ID3 tags to a static dump over the collection (per #4), and then removing all the ID3 tags to sanitize the collection.
In line 201 in utils.py, one of the columns being called is
('track', 'genres_top')
but shouldn't it be
('track', 'genre_top')
based on tracks.csv?
However when I make that change, I get the following error:
ValueError Traceback (most recent call last)
<ipython-input-5-eecca7133c46> in <module>()
----> 1 tracks = utils.load('fma_metadata/tracks.csv')
2 genres = utils.load('fma_metadata/genres.csv')
3 features = utils.load('fma_metadata/features.csv')
4 echonest = utils.load('fma_metadata/echonest.csv')
5
~\OneDrive\Documents\GitHub\fma\utils.py in load(filepath)
201 ('track', 'genre_top')]
202 for column in COLUMNS:
--> 203 tracks[column] = tracks[column].map(ast.literal_eval)
204
205 COLUMNS = [('track', 'date_created'), ('track', 'date_recorded'),
C:\Anaconda3\lib\site-packages\pandas\core\series.py in map(self, arg, na_action)
2052 index=self.index).__finalize__(self)
2053 else:
-> 2054 mapped = map_f(values, arg)
2055 return self._constructor(mapped,
2056 index=self.index).__finalize__(self)
pandas\src\inference.pyx in pandas.lib.map_infer (pandas\lib.c:62578)()
C:\Anaconda3\lib\ast.py in literal_eval(node_or_string)
82 return left - right
83 raise ValueError('malformed node or string: ' + repr(node))
---> 84 return _convert(node_or_string)
85
86
C:\Anaconda3\lib\ast.py in _convert(node)
81 else:
82 return left - right
---> 83 raise ValueError('malformed node or string: ' + repr(node))
84 return _convert(node_or_string)
85
ValueError: malformed node or string: <_ast.BinOp object at 0x00000253FB0C02B0>
I can load in the features, echonest, and genres with no errors.
Hello I am trying to browse through the archive and having a hard time to understand the metadata.zip.
For example if I want to get the title of file 020/020001.mp3 in the fma_large.zip, how can I locate it in the tracks.csv?
KeyError Traceback (most recent call last)
in ()
3
4 # Load metadata and features.
----> 5 tracks = utils.load('tracks.csv')
6 genres = utils.load('genres.csv')
7 features = utils.load('features.csv')
~\Desktop\ML\DeepAudioClassification-master - Copy\utils.py in load(filepath)
201 ('track', 'genres_top')]
202 for column in COLUMNS:
--> 203 tracks[column] = tracks[column].map(ast.literal_eval)
204
205 COLUMNS = [('track', 'date_created'), ('track', 'date_recorded'),
~\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key)
2055 if isinstance(i, slice):
2056 return self[i]
-> 2057 else:
2058 label = self.index[i]
2059 if isinstance(label, Index):
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_multilevel(self, key)
2099 # a 0-len ndarray. This is effectively catching
2100 # a numpy error (as numpy should really raise)
-> 2101 values = self._data.iget(i)
2102
2103 if index_len and not len(values):
~\Anaconda3\lib\site-packages\pandas\indexes\multi.py in get_loc(self, key, method)
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)()
pandas\src\hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)()
pandas\src\hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)()
KeyError: ('track', 'genres_top')
Error while loading csv files
Does anyone know fix?
Hi, there are 6 files that are much shorter than 30s:
fma_small/098/098565.mp3 --> 1.6s
fma_small/098/098567.mp3 --> 0.5s
fma_small/098/098569.mp3 --> 1.5s
fma_small/099/099134.mp3 --> 0s
fma_small/108/108925.mp3 --> 0s
fma_small/133/133297.mp3 --> 0s
, in case it's not a known issue.
Below are issues affecting the rc1
data release that cannot be fixed without a data update. As updating is disruptive (it'll break code and make results non-comparable), it should be done sparingly, e.g., to fix a fatal flaw or many small ones discovered over time.
master
): small subset's list, medium subset's list (#8)next
): metadata from mp3 not API, ensure 30s (8077afe, 00d5b71, 840b337)master
): list the 937 duplicatesnext
): remove them (try other methods and detect near duplicates)Workarounds are explained in more details in the wiki.
There are couples of wrong organized data in tracks.csv such as some texts, row 62, 64, 65 etc, appeared in the column track listens.
Hi, the FMA dataset looks amazing, thank you so much for sharing this!
I'm planning a research project that will involve multimodal models trained on album covers as well as the audio signal of tracks from the respective albums. Does FMA include cover art?
If I understand correctly from the FMA paper, cover art is not yet included. Quoting from the discussion section:
Cover images for tracks, albums, and artists are another public asset which may be of interest.
From what I can tell on the freemusicarchive.org website, albums seem to usually (always?) come with a cover image. I think I might be able to automatically download these based on the album names in the FMA dataset. However, I'm wondering if there is a better way. I would appreciate any recommendation.
I apologize if I missed a step or did not do something on my part. Thank you for the data and all the examples.
The training using cnn after pre-processing the audio files starts off but as soon as some files are fetched, the training stops with the below error:
Unknown: CalledProcessError: Command '['ffmpeg', '-i', 'path-to-dataset\\fma_small\\099\\**099134**.mp3', '-f', 's16le', '-acodec', 'pcm_s16le', '-ac', '1', '-']' return ed non-zero exit status 1.
Looking at this, I checked the file 099134 and my default audio player could not play it and also the metadata(in File explorer) for that file seems to be missing as shown below
I would like to download some untrimmed tracks, is there any way besides downloading the full dataset? Unfortunately I don't have 879GiB available :)
A dataset the size of fma_small but with untrimmed tracks would suffice.
Thanks!
I can not find a way to split the source audio zip to 8 genres with 1000 tracks.I can't find a file to help me do this.Would you mind helping me? Please...Thanks
As title says. Probably it's not valid?
Hi,
Thanks very much for this great dataset.
I am working on a research project that requires lossless music files as input. I wonder if there is any way for us (or me) to get a .wav or .flac lossless version of your dataset by any chance.
When I unzip the file on my MacBook I get only around 1945 tracks instead of the 8000 mentioned.
The track "048367" is causing an issue while unzipping and it stops over there.
I've used the default application, Keka and The Unarchiver but all three are resulting in the same issue.
I tried unzipping with 7zip on a windows OS but I'm still getting the same 1945 tracks since it stops at track "048367"
Anybody else facing the same problem?
Hi, I've referred to the Usage section in the README as well as #9 and #10. I've checked out rc1 because it's appropriate for the version of fma_metadata.zip and fma_small.zip that I checked out, and also I've set my environment variables.
Nevertheless, running the line
tracks = utils.load('tracks.csv')
in either the usage.ipynb file or my own very simple Python script will produce a ValueError about categories:
Traceback (most recent call last):
File "proc_fma.py", line 3, in <module>
tracks = utils.load('fma_metadata/tracks.csv')
File "/media/datadrive/datasets/fma/utils.py", line 213, in load
'category', categories=SUBSETS, ordered=True)
File "/home/shawley/anaconda3/envs/panotti/lib/python3.6/site-packages/pandas/core/generic.py", line 5883, in astype
dtype=dtype, copy=copy, errors=errors, **kwargs
File "/home/shawley/anaconda3/envs/panotti/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 581, in astype
return self.apply("astype", dtype=dtype, **kwargs)
File "/home/shawley/anaconda3/envs/panotti/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 438, in apply
applied = getattr(b, f)(**kwargs)
File "/home/shawley/anaconda3/envs/panotti/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 557, in astype
return self._astype(dtype, copy=copy, errors=errors, values=values, **kwargs)
File "/home/shawley/anaconda3/envs/panotti/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 598, in _astype
"Got an unexpected argument: {}".format(deprecated_arg)
ValueError: Got an unexpected argument: categories
I haven't seen this error reported in any of the other issues. Can anyone help, e.g. @mdeff ?
Thanks!
Hi, im trying to use the fma dataset for cnn training.
I'm currently attempting to retrieve metadata for the fma_small subset (the track_id and genre_top) for the 8000, however there seems to be 11 rows of missing data. Perhaps my csv file is corrupt or there is an error.
Appreciate your help!
I repackaged dataset with zstd and uploaded it to academictorrents.com.
http://academictorrents.com/details/dba20c45d4d6fa6453a4e99d2f8a4817893cfb94
Also it is temporarily availible as a direct link here
http://fma.mine.toys/fma/checksums
http://fma.mine.toys/fma/fma_metadata.tar.zst
http://fma.mine.toys/fma/fma_small.tar.zst
http://fma.mine.toys/fma/fma_medium.tar.zst
http://fma.mine.toys/fma/fma_large.tar.zst
http://fma.mine.toys/fma/fma_full.tar.zst
Zstd is way faster than Zip to unpack. If you don't have "tar" with zstd support you can install if from conda.
conda install tar zstd
How to unpack
tar -xaf fma_small.tar.zst
How to pack
tar -caf fma_small.tar.zst fma_small/
Alternatively you can install binary and use zstd as an external command for tar.
sudo apt install zstd
tar -I zstd -xvf fma_small.tar.zst
tar -I zstd -cf fma_small.tar.zst fma_small/
If that is desirable and appropriate then I can make a PR with changes to README.
I tried downloading the main metadata file to look at the underlying CSVs: https://os.unil.cloud.switch.ch/fma/fma_metadata.zip
I'm getting a rejection on both mac and windows when I try to unzip this. Am I missing something?
Is it possible to download only a specific subset of the FMA_full zip file?
Hello, I am trying to work with this dataset for a personal project, but there seems to be an issue with utils.load. I keep getting this error even with the new code that uses CategoricalDtype:
I went through the other closed issues and tried a new git clone, but it still does not work? Is there a way around this?
Edit: Was able to resolve. For some reason, even if I deleted the code where the old version was used, the error would still fall on the deleted line. I just had to copy-paste everything I needed into a new script and it worked.
The download of fma_full.zip
stops before completion.
# From the README
curl -O https://os.unil.cloud.switch.ch/fma/fma_full.zip
which, after a while, gives something like this:
transfer closed with n bytes remaining to read
Any advice/help would be greatly appreciated :)
Hi, is the order in genres list for each track sorted by significance, i.e., is it random? Why it would be great to have that information is because you can say "This song is mostly jazz with elements of experimental rock and a bit of reggae", even though that task is mostly too fuzzy to give strong claims, but still relying just a bit on this information seems better than having a collection of tags in random order.
Hello, I was trying to convert the small dataset to .wav using pydub and some files gave me errors trying to import. I tried them with librosa and they also failed. The files are as listed:
fma_small/099/099134.mp3
fma_small/108/108925.mp3
fma_small/133/133297.mp3
Please let me know if I did something wrong or if you are also getting the error. Thanks.
Nevermind, it was my mistake :)
A suggestion - I notice there are a few open issues about outdated data version, so I presume the hosting of this data is inconvenient to update. As such i might be worth hosting the data somewhere else.
according to the FAQ, Microsoft Research Open Data will host data sets up to 250gb. Amazon ad probably google offer similar schemes.
After some digging, I'm reasonably confident that there are a fair number of files that have at least one exact duplicate in the fma_full
zipfile. This came up when I was trouble-shooting some weird behavior, and noticed that the ID3 metadata associated with a track didn't match the CSV file of track metadata, but did match a different row.
Metadata matching is at best a wicked pain, so instead I took at look at which files match based on a hash of the bytestream:
import hashlib, glob, os
from joblib import Parallel, delayed
def hash_one(fname):
hsh = hashlib.sha384()
hsh.update(open(fname, 'rb').read())
return hsh.digest().hex()
pool = Parallel(n_jobs=-2, verbose=20)
dfx = delayed(hash_one)
fnames = glob.glob('fma_full/*/*mp3')
fhashes = pool(dfx(fn) for fn in fnames) # takes approx 20min w/64 cores :oD
groups = dict()
for fh, fn in zip(fhashes, fnames):
if fh not in groups:
groups[fh] = []
groups[fh].append(os.path.splitext(os.path.basename(fn))[0])
This produces 105637 unique file hashes from 106574, with 105042 pointing to a single file.
I've reproduced this twice decompressing the zipfile, so I'm pretty sure it's nothing I did. That said, I also downloaded the dataset a long time ago (last summer, maybe?), and I'm curious if it's been updated at all?
I'm curious what might have caused this, and wonder if the 105k tracks without duplicates map to accurate metadata in the raw_tracks.csv
file? I haven't had a chance to check the ID3 tag coverage yet, but that should be an easy thing to look into.
for what it's worth, I also haven't looked at the smaller partitions, so I'm not sure if / how this might affect other uses of the dataset. Will follow up later if / when I learn more.
i downloaded the fma_medium dataset it has 161 folders with almost 1000 30's tracks each but there in no information as what genres they belong as description for fma_medium states it should have 25000 tracks for 16 unbalanced genres and if it a subset of fma_large the metadata file for genres has more than 161 genres for me to able to match
When I run the baseline, I bump into this problems. Can anyone help me with this?
Dimensionality: (59953,)
Epoch 1/2
1664/19922 [=>............................] - ETA: 2559s - loss: 15.5950 - acc: 0.0325
Process Process-7:
Traceback (most recent call last):
File "/anaconda3/envs/deeplearning3.5/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap
self.run()
File "/anaconda3/envs/deeplearning3.5/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/anaconda3/envs/deeplearning3.5/lib/python3.5/site-packages/keras/engine/training.py", line 429, in data_generator_task
generator_output = next(self._generator)
File "/Users/dc/Downloads/fma-rc1/utils.py", line 327, in __next__
self.X[i] = self.loader.load(get_audio_path(audio_dir, tid))
ValueError: could not broadcast input array from shape (59943) into shape (59953)
1696/19922 [=>............................] - ETA: 2552s - loss: 15.5954 - acc: 0.0324
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-25-373babcd7ac0> in <module>()
16 model.compile(optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
17
---> 18 model.fit_generator(SampleLoader(train, batch_size=32), train.size, nb_epoch=2, **params)
19 loss = model.evaluate_generator(SampleLoader(val, batch_size=32), val.size, **params)
20 loss = model.evaluate_generator(SampleLoader(test, batch_size=32), test.size, **params)
/anaconda3/envs/deeplearning3.5/lib/python3.5/site-packages/keras/models.py in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe, initial_epoch, **kwargs)
933 nb_worker=nb_worker,
934 pickle_safe=pickle_safe,
--> 935 initial_epoch=initial_epoch)
936
937 def evaluate_generator(self, generator, val_samples,
/anaconda3/envs/deeplearning3.5/lib/python3.5/site-packages/keras/engine/training.py in fit_generator(self, generator, samples_per_epoch, nb_epoch, verbose, callbacks, validation_data, nb_val_samples, class_weight, max_q_size, nb_worker, pickle_safe, initial_epoch)
1530 '(x, y, sample_weight) '
1531 'or (x, y). Found: ' +
-> 1532 str(generator_output))
1533 if len(generator_output) == 2:
1534 x, y = generator_output
ValueError: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None
It seems file utils.py is incorrect as the usage notebook example does not work correctly with the current util.spy
Error called "FileNotFoundError : no module called pymongo.dbref"
Subsequently editing the init.py adding "bson.dbref" or pymongo.database, for some reason renders every subsequent import useless. Can you please address this?
tracks = utils.load(r'data\fma_metadata\tracks.csv')
features = utils.load(r'data\fma_metadata\features.csv')
echonest = utils.load(r'data\fma_metadata\echonest.csv')
np.testing.assert_array_equal(features.index, tracks.index)
assert echonest.index.isin(tracks.index).all()
tracks.shape, features.shape, echonest.shape
This is the second block in the baseline.ipynb,
I get this KeyError:
<ipython-input-9-ed98c1f7f0d0> in <module>()
1 AUDIO_DIR = os.environ.get('AUDIO_DIR')
2
----> 3 tracks = utils.load(r'data\fma_metadata\tracks.csv')
4 features = utils.load(r'data\fma_metadata\features.csv')
5 echonest = utils.load(r'data\fma_metadata\echonest.csv')
G:\www\fma\utils.py in load(filepath)
201 ('track', 'genres_top')]
202 for column in COLUMNS:
--> 203 tracks[column] = tracks[column].map(ast.literal_eval)
204
205 COLUMNS = [('track', 'date_created'), ('track', 'date_recorded'),
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
1960 return self._getitem_frame(key)
1961 elif is_mi_columns:
-> 1962 return self._getitem_multilevel(key)
1963 else:
1964 return self._getitem_column(key)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_multilevel(self, key)
2004
2005 def _getitem_multilevel(self, key):
-> 2006 loc = self.columns.get_loc(key)
2007 if isinstance(loc, (slice, Series, np.ndarray, Index)):
2008 new_columns = self.columns[loc]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py in get_loc(self, key, method)
1998 key = _values_from_object(key)
1999 key = tuple(map(_maybe_str_to_time_stamp, key, self.levels))
-> 2000 return self._engine.get_loc(key)
2001
2002 # -- partial selection or non-unique index
pandas\_libs\index.pyx in pandas._libs.index.MultiIndexObjectEngine.get_loc (pandas\_libs\index.c:12722)()
pandas\_libs\index.pyx in pandas._libs.index.MultiIndexObjectEngine.get_loc (pandas\_libs\index.c:12643)()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)()
KeyError: ('track', 'genres_top')```
Hi,
I am trying to use FMA for my project work and it seems almost half of the genre information in the data is NaN. How do you recommend we deal with these?
Thanks
Hi, as radio station we have a much larger collection of lossless encoded audio (FLAC). Would it be interesting to see the performance on our collection?
Id like to just download the features.csv file without downloadind the whole 7.2 GiB (I don't need the 30s samples) is there a way to do it without downloading the whole dataset since my connection is kinda slow ?
Hello. I am going to use the FMA_large dataset. I notice that the dataset is separated into 1-155 in order, which is not matched with genre_id. Hence, I wander what should I do to get the exact genre of each folder??
Are there uncompressed versions of the audio, in a format like wav (or in formats that are losslessly compressed)?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.