Comments (3)
The window size for computing spectra trades temporal
resolution (short windows) against frequential resolution (long
windows). Both for log-mel and constant-Q spectra, it is
possible to use shorter windows for higher frequencies, but this
results in inhomogeneously blurred spectrograms unsuitable
for spatially local models. Alternatives include computing
spectra with different window lengths, projected down to the
same frequency bands, and treated as separate channels [16].
In [17] the authors also investigated combinations of different
spectral features.
from audiotagging.
I think there are a few experiments here worth carrying out:
- Create logmel spectrograms with 3 different window sizes (
n_fft
) and stack them together into RGB images - Try stacking other representations with logmel spectrogram. Constant Q Transform seems to be a popular one. Continuous Wavelet Transform is another. For more information see: https://arxiv.org/pdf/1706.07156.pdf
from audiotagging.
xresnet18
: 0.8409990
xresnet18
: 0.8438536
xresnet18
: 0.8428597
xresnet18
with 3 channels: 0.845929
xresnet18
with 3 channels: 0.844033
xresnet18
with 3 channels: 0.846719
Looks like there's a marginal increase about ~0.003
from audiotagging.
Related Issues (20)
- Investigate range of values in our images HOT 1
- Generate images on the fly HOT 1
- Try different network architectures
- Consider other models HOT 7
- Try .to_fp16() HOT 1
- Try without imagenet normalization HOT 2
- Look at what we're getting wrong. HOT 3
- Explore the lengths of the noisy dataset and test dataset HOT 1
- Keep Track of Results
- Try with more folds HOT 1
- Try xresnet with PReLU or LeakyReLU HOT 3
- Figure out how many crops to take HOT 1
- Explore lwlwrap HOT 1
- Figure out best label smoothing parameters HOT 2
- Try regenerating dataset with different audio parameters HOT 4
- Correct or remove corrupted audio files
- Consider custom loss function?
- Try with RandomResizedCrop augmentation of melspectrogram?
- Incorporating Noisy Dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from audiotagging.