Coder Social home page Coder Social logo

deep_audio_features's Introduction

deep_audio_features: training an using CNNs on audio classification tasks

1 About

deep_audio_features is a Python library for training Convolutional Neural Netowrks as audio classifiers. The library provides wrappers to pytorch for training CNNs on audio classification tasks, and using the CNNs as feature extractors.

2 Installation

Εither use the source code

git clone https://github.com/tyiannak/deep_audio_features

Or install using pip

pip3 install deep-audio-features -U 

3 Functionality

3.1 Training a CNN

To train a CNN you can use the following command:

python3 deep_audio_features/bin/basic_training.py -i /path/to/folder1 /path/to/folder2

-i : select the folders where the data will be loaded from. -o : select the exported file name.

Or call the following function in Python:

from deep_audio_features.bin import basic_training as bt
bt.train_model(["low","medium","high"], "energy")

The code above reads the WAV files in 3 folders, uses the folder names as classnames, extracts spectrogram representations from the respective sounds, trains and validates the CNN and saves the trained model in pkl/energy.pt

3.2 Testing a CNN

3.2.1 Inference

To perform inference for one file only, run:

python3 deep_audio_features/bin/basic_test.py -m /path/to/model/ -i /path/to/file (-s)

-i : select the file where the testing data will be loaded from.

-m : select a model to apply testing.

-s : if included extracts segment level predictions of a sequence

Or call the following function in Python:

from deep_audio_features.bin import basic_test as btest
d, p = btest.test_model("pkl/energy.pt", 'some_file.wav', layers_dropped=0, test_segmentation=False)

The code above will use the CNN trained befre to classify an audio signal stored in some_file.wav. d stores the decision (class indices) and p the soft outputs of the classes. If layers_dropped is positive, d is empty an p contains the outputs of the N-layers_dropped layer (N is the total number of layers in the CNN). E.g. if layers_dropped, p will contain the outputs of the last fully connected layer, before softmax.

3.2.2 Evaluate on new data

To perform evaluation on different data, run:

python3 deep_audio_features/bin/classification_report.py -m /path/to/model/ -i /path/to/folder1 /path/to/folder2

-i : select the folders where the testing data will be loaded from.

-m : select a model to apply testing.

Or call the following function in Python:

from deep_audio_features.bin import classification_report as creport
creport.test_report("/path/to/model/", ["low","medium","high"], layers_dropped=0)

3.3 Transfer learning

To transfer knowledge from a pre-trained model and fit it on a new target task you can use the following command:

python3 deep_audio_features/bin/transfer_learning.py -m /path/to/model -i /path/to/folder1 /path/to/folder2 -l layers_freezed -s

-m : select a model to apply fine-tuning. -i : select the folders where the data will be loaded from. -l : the number of layers (layers_freezed) to be freezed (counting from the first convolutional layer to the last linear layer) -s : is an optional default strategy (it cancels -l flag) that freezes all the convolutional layers and trains just the linear ones

Similarly, you will need the same params to call the deep_audio_features.bin.transfer_learning.transfer_learning() function to transfer knowledge from a task to another:

from deep_audio_features.bin import transfer_learning as tl
tl.transfer_learning('pkl/emotion_energy.pt', ['test/low/', 'test/high'] , strategy=0, layers_freezed=0)

(The model will be saved in a local filename based on the timestamp)

3.4 Combine CNN features

In deep_audio_features/combine/config.yaml choose (i) which CNN models you want to combine by modifying either the model_paths or the google_drive_ids fields (in case the models are stored in google drive), (ii) whether you want to combine different CNN models (extract_nn_features boolean variable), use hand-crafted audio features using the pyAudioAnalysis library (extract_basic_features boolean variable), or combine the aforementioned choices (both variables set to True).

3.4.1 Train a combination of CNNs

python3 deep_audio_features/combine/trainer.py -i 4class_small/music_small 4class_small/speech_small -c deep_audio_features/combine/config.yaml

or in Python:

from deep_audio_features.combine import trainer
trainer.train(["4class_small/music_small", "4class_small/speech_small"], None, "config.yaml")

3.4.2 Evaluate the combiner

python3 deep_audio_features/combine/classification_report.py -m pkl/SVM_Thu_Jul_29_20:06:51_2021.pt -i 4class_small/music_small 4class_small/speech_small

or in Python:

from deep_audio_features.combine import classification_report
import pickle
modification = pickle.load(open("pkl/SVM_Thu_Jul_29_20:31:35_2021.pt", 'rb'))
classification_report.combine_test_report(["4class_small/music_small", "4class_small/speech_small"], modification)

3.4.3 Predict on an unknown sample using the combiner

python3 deep_audio_features/combine/predict.py -m pkl/SVM_Thu_Jul_29_20:06:51_2021.pt -i 4class_balanced/speech/s_BDYDHQBQMX_30.0_31.0.wav

or in Python:

from deep_audio_features.combine import predict
predict.predict("4class_balanced/speech/s_BDYDHQBQMX_30.0_31.0.wav", modification)

(load model as above)

3.5 Train a Convolutional Autoencoder

To train a Convolutional Autoencoder you can use the following command:

python3 deep_audio_features/bin/basic_training.py -t representation -i /path/to/folder1 /path/to/folder2

-t : performed task is representation learning. -i : select the unique folder or multiple folders where the data will be loaded from. -o : select the exported file name.

Or call the following function in Python:

from deep_audio_features.bin import basic_training as bt
bt.train_model(["low","medium","high"], "energy", task="representation")

The code above reads the WAV files in 3 folders, but avoids to use the folder names as classnames since an unsupervised process is performed. Then it extracts spectrogram representations from the respective sounds, trains and validates the ConvAE and saves the trained model in pkl/energy.pt

The number of channels in the final representation is set from the REPRESENTATION_CHANNELS variable found in deep_audio_feature/bin/config.py.

3.6 Testing a Convolutional Autoencoder

python3 deep_audio_features/bin/basic_test.py -m /path/to/model/ -i /path/to/file (-s)

-i : select the file where the testing data will be loaded from.

-m : select a model to apply testing.

-s : if included extracts segment level predictions of a sequence

Or call the following function in Python:

from deep_audio_features.bin import basic_test as btest
emb, _ = btest.test_model("pkl/energy.pt", 'some_file.wav', test_segmentation=False)

The code above will find out that the model is a ConvAE and will load it in order to extract the embedding from an audio signal stored in some_file.wav. emb stores the produced embeddings.

deep_audio_features's People

Contributors

dkatsiros avatar nikosmichas avatar pakoromilas avatar sofiaele avatar tyiannak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

deep_audio_features's Issues

Error in histogram file name on Windows 10.

Got this error on Windows 10 running:

C:\Python310\lib\site-packages\deep_audio_features\bin\basic_training.py -i "genres/blues" "genres/classical" "genres/country" "genres/disco" "genres/hiphop", "genres/jazz" "genres/metal" "genres/pop" "genres/reggae" "genres/rock" -o "energy"

...
--> Plotting histogram of spectrogram sizes.
Traceback (most recent call last):
File "", line 1, in
File "C:\Python310\lib\site-packages\deep_audio_features\bin\basic_training.py", line 64, in train_model
train_set = FeatureExtractorDataset(X=files_train, y=y_train,
File "C:\Python310\lib\site-packages\deep_audio_features\dataloading\dataloading.py", line 86, in init
self.plot_hist(spec_sizes, y)
File "C:\Python310\lib\site-packages\deep_audio_features\dataloading\dataloading.py", line 261, in plot_hist
plt.savefig(ct.strftime("%m_%d_%Y, %H:%M:%S") + ".png")
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\matplotlib\pyplot.py", line 1023, in savefig
res = fig.savefig(*args, **kwargs)
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\matplotlib\figure.py", line 3378, in savefig
self.canvas.print_figure(fname, **kwargs)
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\matplotlib\backend_bases.py", line 2366, in print_figure
result = print_method(
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\matplotlib\backend_bases.py", line 2232, in
print_method = functools.wraps(meth)(lambda *args, **kwargs: meth(
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\matplotlib\backends\backend_agg.py", line 509, in print_png
self._print_pil(filename_or_obj, "png", pil_kwargs, metadata)
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\matplotlib\backends\backend_agg.py", line 458, in _print_pil
mpl.image.imsave(
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\matplotlib\image.py", line 1689, in imsave
image.save(fname, **pil_kwargs)
File "C:\Python310\Lib\site-packages\deep_audio_features\utils../..\PIL\Image.py", line 2410, in save
fp = builtins.open(filename, "w+b")
OSError: [Errno 22] Invalid argument: '01_09_2024, 08:51:43.png'

Refactor

  1. Configure architectures from config file
  2. Class for training and validation

CNN + TRL

CNN and Tensor Regression Layer instead of linear

Max sequence length computation error

The code down below does not compute the max sequence length. Please check the length formula.

with contextlib.closing(wave.open(f, 'r')) as fp:
frames = fp.getnframes()
fs = fp.getframerate()
duration = frames / float(fs)
length = int((duration -
(config.HOP_LENGTH - config.HOP_LENGTH)) / \
(config.HOP_LENGTH) + 1)

Bug in classification report?

Theres a bug related to path dept in classification report. To reproduce:

from deep_audio_features.bin import classification_report as cr
cr.test_report('/Users/tyiannak/Downloads/soundscape_8k_1s.pt', ['/Users/tyiannak/Downloads/soundscape_8k_1sec/test/1', '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/2/', '/Users/tyiannak/Downloads/soundscape_8k_1sec/te
   ...: st/3', '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/4', '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/5'])

Loaded model class mapping: {0: '1', 1: '2', 2: '3', 3: '4', 4: '5'}
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-2-3e59e19efe16> in <module>
----> 1 cr.test_report('/Users/tyiannak/Downloads/soundscape_8k_1s.pt', ['/Users/tyiannak/Downloads/soundscape_8k_1sec/test/1', '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/2/', '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/3', '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/4', '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/5'])

/usr/local/lib/python3.9/site-packages/deep_audio_features/bin/classification_report.py in test_report(model_path, folders)
     58 
     59     max_seq_length = model.max_sequence_length
---> 60     files_test, y_test, class_mapping = load_dataset.load(
     61         folders=folders, test=False,
     62         validation=False, class_mapping=class_mapping)

/usr/local/lib/python3.9/site-packages/deep_audio_features/utils/load_dataset.py in load(folders, test_val, test, validation, class_mapping)
     71         folder2idx = {v: k for k, v in idx2folder.items()}
     72 
---> 73     labels = list(map(lambda x: folder2idx[x], labels))
     74 
     75     class_mapping = {}

/usr/local/lib/python3.9/site-packages/deep_audio_features/utils/load_dataset.py in <lambda>(x)
     71         folder2idx = {v: k for k, v in idx2folder.items()}
     72 
---> 73     labels = list(map(lambda x: folder2idx[x], labels))
     74 
     75     class_mapping = {}

KeyError: '/Users/tyiannak/Downloads/soundscape_8k_1sec/test/1'

if I go to the soundscape_8k_1sec path and then run

cr.test_report('../soundscape_8k_1s.pt', ['test/1', 'test/2/', 'test/3', 'test/4', 'test/5'])

Everything runs ok.

Also if I use the long path in the bin.basic_training script it also runs ok. So probably sth is going wrong with the load_dataset.load(), around the class mapping assignment when classification_report is used.

"ValueError" in Transfer Learning script

Hi,
While running the transfer learning script in terminal, I get a "ValueError":
Resetting model to epoch 14. Traceback (most recent call last): File "bin/transfer_learning.py", line 179, in <module> transfer_learning(model=modelpath, folders=folders, strategy=strategy) File "bin/transfer_learning.py", line 122, in transfer_learning best_model, train_losses, valid_losses, train_accuracy, \ ValueError: too many values to unpack (expected 6)

What can I do?

Thanks!

error in audioTrainTest.extract_features_and_train if class folder contain only 1 sample

295 for feat in features:
296         temp = []
297         for i in range(feat.shape[0]):
298             temp_fv = feat[i, :]

if one of the class folder has only 1 sample, feat will be an 1d array of shape (, 136). And line 398 will give an error as it tries to access 1d array with 2d indices (feat[i, :])

I propose the following fix

    for feat in features:
        if feat.ndim == 1: # this class has only 1 sample
            feat = feat.reshape((1, feat.shape[0]))
        temp = []
        for i in range(feat.shape[0]):
            temp_fv = feat[i, :]

"pop up windows" in scripts

Hi,

When I execute the "training script" (and I think same happens with the other two scripts also) it starts like this:
1

I see the terminal window and a "pop up" window named "Figure 1". In order to proceed I must close the pop up window. If I don't nothing happens. So When I close it the script continues like this:
2
Again, I have to close the window to continue the execution. When I close it, the script continues as expected but:
3

This time I can't close the pop up "Figure 1" window until the script is finished running.

So, suppose this problem is something you can reproduce is there a way to fix it?
It would be great if the Histograms could be saved as an image automatically without user interference.

Thank you very much,

Testing Script failure

Hello,
After a successful run of the training script, I got a model and applied it to the testing script. This is what I got after the execution:

Traceback (most recent call last): File "basic_test.py", line 94, in <module> test_model(modelpath=model, ifile=ifile, layers_dropped=layers_dropped) File "basic_test.py", line 54, in test_model fuse=fuse) TypeError: __init__() got an unexpected keyword argument 'spec_size'

Thanks,

Int16 melgram

Check if Int16 melgram has comparable performance to float 32 melgram

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.