Music genre recognition on GTZAN dataset

Music genre recognition on GTZAN dataset
- Understanding GTZAN dataset
- Training results

Dataset: https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification

Understanding GTZAN dataset

There are samples of audio files, classified according to their genre.
All genres have 100 samples.

Training results

In GTZAN dataset there are two CSV files with features:

features_30_sec.csv
features_3_sec.csv

We'll train on these two sets to observe differences:

30 sec samples (features_30_sec.csv)

First test was done on 30 seconds of samples.

Training details

Epoch	Train Loss	Valid Loss	Accuracy	Time
0	2.410511	2.301348	0.100000	00:00
1	2.358546	2.294634	0.150000	00:00
2	2.308661	2.264426	0.200000	00:00
3	2.234084	2.203711	0.237500	00:00
4	2.171810	2.121932	0.243750	00:00
5	2.107840	2.050719	0.256250	00:00
6	2.039669	1.963892	0.281250	00:00
7	1.977348	1.895904	0.306250	00:00
8	1.910779	1.822734	0.331250	00:00
9	1.841186	1.764186	0.337500	00:00
10	1.774352	1.701079	0.356250	00:00
11	1.703704	1.629614	0.356250	00:00
12	1.633385	1.574726	0.387500	00:00
13	1.564862	1.509628	0.412500	00:00
14	1.501113	1.446676	0.475000	00:00
15	1.449700	1.387999	0.500000	00:00
16	1.374840	1.333856	0.575000	00:00
17	1.309132	1.269413	0.606250	00:00
18	1.251293	1.207156	0.625000	00:00
19	1.191447	1.157811	0.643750	00:00
20	1.127759	1.115797	0.650000	00:00
21	1.067127	1.049386	0.687500	00:00
22	1.009375	1.008271	0.706250	00:00
23	0.951300	0.977570	0.718750	00:00
24	0.893850	0.922465	0.731250	00:00
25	0.838244	0.878590	0.712500	00:00
26	0.787135	0.868914	0.725000	00:00
27	0.732795	0.853740	0.737500	00:00
28	0.678135	0.883555	0.693750	00:00
29	0.625145	0.845650	0.706250	00:00
30	0.583684	0.802656	0.725000	00:00
31	0.532835	0.794608	0.731250	00:00
32	0.489864	0.795410	0.743750	00:00
33	0.443835	0.774997	0.743750	00:00
34	0.409310	0.803072	0.725000	00:00
35	0.375299	0.759579	0.731250	00:00
36	0.340224	0.802695	0.743750	00:00
37	0.306562	0.796328	0.743750	00:00
38	0.280238	0.809968	0.750000	00:00
39	0.254191	0.783211	0.743750	00:00
40	0.228860	0.777353	0.762500	00:00
41	0.208046	0.811578	0.725000	00:00
42	0.186007	0.822938	0.731250	00:00
43	0.170087	0.735695	0.737500	00:00
44	0.159519	0.791559	0.743750	00:00
45	0.149130	0.859067	0.718750	00:00
46	0.136737	0.820262	0.762500	00:00
47	0.122899	0.873859	0.743750	00:00
48	0.116056	0.791122	0.750000	00:00
49	0.107419	0.854898	0.706250	00:00
50	0.100285	0.879887	0.718750	00:00
51	0.089521	0.862034	0.743750	00:00
52	0.081394	0.828892	0.737500	00:00
53	0.073154	0.887935	0.743750	00:00

And classification report results:

Class	Precision	Recall	F1-Score	Support
0 (blues)	0.80	0.80	0.80	20
1 (classical)	0.86	1.00	0.92	12
2 (country)	0.60	0.75	0.67	16
3 (disco)	0.67	0.67	0.67	18
4 (hiphop)	0.92	0.55	0.69	22
5 (jazz)	0.94	0.71	0.81	21
6 (metal)	0.87	0.93	0.90	14
7 (pop)	0.65	0.85	0.73	13
8 (reggae)	0.46	0.67	0.55	9
9 (rock)	0.71	0.67	0.69	15

Accuracy	0.74			160
Macro Avg	0.75	0.76	0.74	160
Weighted Avg	0.77	0.74	0.74	160

Running the training a few more times yielded results between accuracies of 70% to 74%.

3 sec samples (features_3_sec.csv)

Second test was done on 3 seconds of samples.

Training details

Epoch	Train Loss	Valid Loss	Accuracy	Time
0	2.032246	1.873680	0.360451	00:01
1	1.751166	1.637501	0.448686	00:01
2	1.570411	1.483692	0.508761	00:01
3	1.430684	1.365427	0.556320	00:01
4	1.307044	1.269431	0.596370	00:01
5	1.219212	1.175792	0.637672	00:01
6	1.138508	1.086589	0.667710	00:01
7	1.026570	1.008204	0.693367	00:01
8	0.954476	0.930218	0.705882	00:01
9	0.889936	0.864621	0.731539	00:01
10	0.815705	0.815270	0.740926	00:01
11	0.761109	0.770075	0.759700	00:01
12	0.719386	0.718789	0.777847	00:01
13	0.663675	0.683248	0.783479	00:01
14	0.603862	0.653833	0.793492	00:01
15	0.555469	0.619850	0.801627	00:01
16	0.506831	0.575025	0.820400	00:01
17	0.471152	0.554522	0.825407	00:01
18	0.433732	0.531214	0.836045	00:01
19	0.416024	0.520105	0.828536	00:01
20	0.362679	0.504946	0.831039	00:01
21	0.333553	0.481875	0.839800	00:01
22	0.313346	0.454316	0.848561	00:01
23	0.295104	0.459885	0.836671	00:01
24	0.260602	0.424484	0.864205	00:01
25	0.246882	0.449828	0.852941	00:01
26	0.212048	0.420795	0.857322	00:01
27	0.194751	0.419301	0.862328	00:01
28	0.193910	0.427121	0.860451	00:01
29	0.177843	0.416969	0.860451	00:01
30	0.161258	0.408608	0.875469	00:01
31	0.168606	0.395215	0.866083	00:01
32	0.144902	0.402985	0.866083	00:01
33	0.127127	0.395520	0.867334	00:01
34	0.130257	0.389912	0.882979	00:01
35	0.136157	0.402041	0.869837	00:01
36	0.128909	0.403417	0.877347	00:01
37	0.124474	0.409458	0.867334	00:01
38	0.112614	0.412239	0.881727	00:01
39	0.120688	0.421026	0.873592	00:01
40	0.101785	0.377680	0.886733	00:01
41	0.107204	0.402054	0.880476	00:01
42	0.104357	0.389439	0.877347	00:01
43	0.101453	0.407585	0.881101	00:01
44	0.088406	0.386060	0.884230	00:01
45	0.090783	0.396249	0.880476	00:01
46	0.082592	0.372472	0.879850	00:01
47	0.086191	0.373930	0.894243	00:01
48	0.081868	0.362440	0.889862	00:01
49	0.065329	0.384066	0.882353	00:01
50	0.083963	0.380745	0.889862	00:01
51	0.082650	0.417959	0.889862	00:01
52	0.075654	0.388021	0.892365	00:01
53	0.080464	0.403859	0.891740	00:01
54	0.063071	0.371227	0.891114	00:01
55	0.073778	0.353450	0.899249	00:01
56	0.073829	0.391176	0.884230	00:01
57	0.064231	0.365528	0.894243	00:01
58	0.062055	0.373422	0.895494	00:01
59	0.063687	0.391158	0.893617	00:01
60	0.061825	0.417160	0.882979	00:01
61	0.052974	0.394985	0.903004	00:01
62	0.063245	0.395809	0.887985	00:01
63	0.054693	0.358832	0.903004	00:01
64	0.052507	0.388410	0.895494	00:01
65	0.052654	0.352484	0.906758	00:01
66	0.051661	0.413555	0.892991	00:01
67	0.056109	0.394127	0.895494	00:01
68	0.056794	0.372550	0.903004	00:01
69	0.058300	0.374250	0.896746	00:01
70	0.046908	0.352735	0.909262	00:01
71	0.054862	0.368875	0.904881	00:01
72	0.042489	0.400974	0.891740	00:01
73	0.045906	0.384763	0.897372	00:01
74	0.053988	0.396204	0.896746	00:01
75	0.042612	0.384841	0.898623	00:01

And classification report results:

Class	Precision	Recall	F1-Score	Support
0 (blues)	0.94	0.89	0.91	163
1 (classical)	0.91	0.93	0.92	146
2 (country)	0.82	0.82	0.82	161
3 (disco)	0.84	0.93	0.88	168
4 (hiphop)	0.94	0.91	0.93	162
5 (jazz)	0.91	0.91	0.91	151
6 (metal)	0.95	0.94	0.94	172
7 (pop)	0.93	0.90	0.92	150
8 (reggae)	0.93	0.90	0.91	163
9 (rock)	0.82	0.86	0.84	162

Accuracy	0.90			1598
Macro Avg	0.90	0.90	0.90	1598
Weighted Avg	0.90	0.90	0.90	1598

Running the training a few more times yielded results between accuracies around 90%.

Image-based training

TODO

mithgroth / music-genre-recognition-gtzan Goto Github PK

music-genre-recognition-gtzan's Introduction

Music genre recognition on GTZAN dataset

Understanding GTZAN dataset

Training results

30 sec samples (features_30_sec.csv)

3 sec samples (features_3_sec.csv)

Image-based training

music-genre-recognition-gtzan's People

Contributors

Stargazers

Watchers

music-genre-recognition-gtzan's Issues

Can I analyse genre of my own audio file using this?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent