Coder Social home page Coder Social logo

yuangongnd / psla Goto Github PK

View Code? Open in Web Editor NEW
129.0 1.0 16.0 7.26 MB

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

License: BSD 3-Clause "New" or "Revised" License

Shell 3.61% Python 96.39%
audio-classification audio deep-learning

psla's Issues

Output of MHA EfficientNet model

Hi Yuan,

Thanks for open-sourcing this repo. I have a quick question about the MHA EfficientNet model you proposed. When I tried the EfficientNet-b2 with the multi-head attention model, I found some values in the out variable were bigger than one, instead of between 0-1. Is that intentionally designed?

Many Thanks

Pretrained models

Hello Yuan,
great work and thank you for making it available for other researchers.
I am currently testing deep learning models on my audio dataset to see which model performs better.
I saw you made available the pretrained EfficientNet B2 models with 4-headed attention. I was wondering if it would be possible to download other pretrained models too, e.g. EfficientNet B2 with Mean Pooling.
Thank you in advance,

Annalisa

regarding dataset prep scripts and audioset splits

Hi,

I couldnt find in any of you recent publications on Audioset how you split the unbalanced (or even balanced) train segments to train and val for hyper parameter tuning. Just to try to replicate your results. also the dropbox link for the PSLA experiments you have listed is down.
On another note regarding FSD50k, could you elaborate what are those "forbidden" classes and why? also could you explain the purpose of this comment in prep_fsd50k.py when generating the JSON files please?:
"# only apply to the vocal sound data"

Thanks

prep_fsd.py problems

I'm following the step-by-step implementation of PSLA here on GitHub, but when I run 'python3 prep_fsd.py,' it creates the folders FSD50K.dev_audio_16k and FSD50K.eval_audio_16k at the specified dataset path. However, it doesn't generate the converted audios inside the respective folders. Any idea what might be happening?

P.S.: The terminal indicates that the samples were created, but they were not.
The data path is defined as fsd_path = './dataset/', and this is the folder structure:

dataset
|
|--FSD50K.dev_audio
|--FSD50K.doc
|--FSD50K.eval_audio
|--FSD50K.ground_truth
|--FSD50K.metadata

using gen_weight_File

Hi,

I'm missing what are you doing with all the weights in the csv file which gen_weight_File has created.

How do you use them afterward?

Thanks ;)

Number of parameters of the model

Hello!

I have a small doubt regarding the model parameters of the EfficientNet-B2 with 4 attention heads. In the paper, 13.64M are reported. However, in practice, after 'removing' the final classification layers from EfficientNet and adding the multi-head attention module, I get reported 7.71M instead of 13.64M. As you can see in the following screenshot, EfficientNet-B2 parameters are immediately reduced to 7.7M after getting rid of the classification layer. On top of that, the multi-head module only has around 11.000 parameters, resulting in 7.71M.

Screenshot 2022-07-04 at 13 27 21

Am I missing something? I am reporting back the number of parameters of this model for my project but I am a bit confused about it. Could you clarify this for me? :)

class_labels_indices.csv is missing in psla/egs/fsd50k/class_labels_indices.csv

File 'class_labels_indices.csv' is missing from the following path 'egs/fsd50k/class_labels_indices.csv'.

To reproduce:
python psla/egs/fsd50k/prep_fsd.py >> No such file or directory: './class_labels_indices.csv'
python psla/src/label_enhancement/fix_type1.py >> No such file or directory: '../../egs/fsd50k/class_labels_indices.csv'

Pretrained enhanced label set

As per README.md, the link under "(Optional) Step 2. Enhance the label of the balanced AudioSet training set"
this link doesn't exist:
[pretrained enhanced label set](https://github.com/YuanGongND/psla/blob/main/here)

Can anyone supply this file?

impact of enhanced labeling on fsd50k

Hi,

I used your pretrained 5th percentile, however it seems that it doesn't have a considerable effect as you can from Label histogram( music and musical instrument are still dominating). I wonder what will make fsd50k more balanced?
Is the provided json include also balancing process, as described in the article?

Thanks

Classic labels: (we splitted the 200 categories to 8 histograms for visibility).
image

With Label Enhancement:
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.