Comments (4)
Parameters to try:
conf = EasyDict()
conf.sampling_rate = 44100 # Highest quality
conf.duration = 3 # About double the length of time we look at
conf.hop_length = 500 # We're looking at about 1.4 seconds
conf.fmin = 20 # Near the lowest a human can hear
conf.fmax = conf.sampling_rate // 2 # The maximum frequency we can represent
conf.n_mels = 128 # Our crops are 128 in height
conf.n_fft = conf.n_mels * 20
conf.samples = conf.sampling_rate * conf.duration
from audiotagging.
This didn't lead to any noticeable improvement :'(
from audiotagging.
From: Section C of https://arxiv.org/pdf/1905.00078.pdf
The receptive field (the number of samples or spectra
involved in computing a prediction) of a CNN is fixed by
its architecture. It can be increased by using larger kernels
or stacking more layers. Especially for raw waveform inputs
with a high sample rate, reaching a sufficient receptive field
size may result in a large number of parameters of the CNN
and high computational complexity.
While we're not using raw waveforms, I've increased the sampling rate from 32kHz to 44.1 kHZ. Perhaps we need a larger receptive field for our convolutions? We could try stacking another 3x3
convolution at the beginning of the network?
They also suggest dilated convolutions, something I've never used before. Basically you take your 3x3
conv filter and stick zeros between the values. I think this makes your 3x3
conv now 5x5
in size. The zeros mean that no new information is added for these points but the receptive field of your filter is now looking at a larger area.
It looks like it's built-in to PyTorch via the dilation
parameter: https://pytorch.org/docs/stable/nn.html#conv2d
from audiotagging.
I would like to try with an extra convnet.
xresnet18
: 0.8409990
xresnet18
: 0.8438536
xresnet18
: 0.8428597
Avg: 0.842570767
xresnet18
3 channels: 0.845929
xresnet18
3 channels: 0.844033
xresnet18
3 channels: 0.846719
Avg: 0.845560333
xresnet18
3 channels + extra conv: 0.8452421
xresnet18
3 channels + extra conv: 0.8453489
xresnet18
3 channels + extra conv: 0.8484291
Avg: 0.846340033
xresnet18
3 channels + dilated convs: 0.83400726
xresnet18
3 channels + dilated convs: 0.8337481
xresnet18
3 channels + dilated convs: 0.8354515
from audiotagging.
Related Issues (20)
- Investigate range of values in our images HOT 1
- Generate images on the fly HOT 1
- Try different network architectures
- Consider other models HOT 7
- Try .to_fp16() HOT 1
- Try without imagenet normalization HOT 2
- Look at what we're getting wrong. HOT 3
- Explore the lengths of the noisy dataset and test dataset HOT 1
- Keep Track of Results
- Try with more folds HOT 1
- Try xresnet with PReLU or LeakyReLU HOT 3
- Figure out how many crops to take HOT 1
- Explore lwlwrap HOT 1
- Figure out best label smoothing parameters HOT 2
- Correct or remove corrupted audio files
- Consider custom loss function?
- Try with RandomResizedCrop augmentation of melspectrogram?
- Incorporating Noisy Dataset
- Consider incorporating other representations of sound into our model HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from audiotagging.