Coder Social home page Coder Social logo

Comments (4)

JoshVarty avatar JoshVarty commented on August 11, 2024

Parameters to try:

conf = EasyDict()

conf.sampling_rate = 44100            # Highest quality
conf.duration = 3                     # About double the length of time we look at
conf.hop_length = 500                 # We're looking at about 1.4 seconds
conf.fmin = 20                        # Near the lowest a human can hear
conf.fmax = conf.sampling_rate // 2   # The maximum frequency we can represent
conf.n_mels = 128                     # Our crops are 128 in height
conf.n_fft = conf.n_mels * 20

conf.samples = conf.sampling_rate * conf.duration

from audiotagging.

JoshVarty avatar JoshVarty commented on August 11, 2024

This didn't lead to any noticeable improvement :'(

from audiotagging.

JoshVarty avatar JoshVarty commented on August 11, 2024

From: Section C of https://arxiv.org/pdf/1905.00078.pdf

The receptive field (the number of samples or spectra
involved in computing a prediction) of a CNN is fixed by
its architecture. It can be increased by using larger kernels
or stacking more layers. Especially for raw waveform inputs
with a high sample rate, reaching a sufficient receptive field
size may result in a large number of parameters of the CNN
and high computational complexity.

While we're not using raw waveforms, I've increased the sampling rate from 32kHz to 44.1 kHZ. Perhaps we need a larger receptive field for our convolutions? We could try stacking another 3x3 convolution at the beginning of the network?

They also suggest dilated convolutions, something I've never used before. Basically you take your 3x3 conv filter and stick zeros between the values. I think this makes your 3x3 conv now 5x5 in size. The zeros mean that no new information is added for these points but the receptive field of your filter is now looking at a larger area.

It looks like it's built-in to PyTorch via the dilation parameter: https://pytorch.org/docs/stable/nn.html#conv2d

from audiotagging.

JoshVarty avatar JoshVarty commented on August 11, 2024

I would like to try with an extra convnet.

xresnet18: 0.8409990
xresnet18: 0.8438536
xresnet18: 0.8428597
Avg: 0.842570767

xresnet18 3 channels: 0.845929
xresnet18 3 channels: 0.844033
xresnet18 3 channels: 0.846719
Avg: 0.845560333

xresnet18 3 channels + extra conv: 0.8452421
xresnet18 3 channels + extra conv: 0.8453489
xresnet18 3 channels + extra conv: 0.8484291
Avg: 0.846340033

xresnet18 3 channels + dilated convs: 0.83400726
xresnet18 3 channels + dilated convs: 0.8337481
xresnet18 3 channels + dilated convs: 0.8354515

from audiotagging.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.