hearbenchmark / hear-eval-kit Goto Github PK
View Code? Open in Web Editor NEWEvaluation kit for the HEAR Benchmark
Home Page: https://hearbenchmark.com
License: Apache License 2.0
Evaluation kit for the HEAR Benchmark
Home Page: https://hearbenchmark.com
License: Apache License 2.0
Might have has same scope at #120
Should have the following components:
•Background and motivation (from proposal doc)
•Summary of relevant prior works and synthesis of general trends in the literature, both in audio as well as adjacent ML fields whose progress in representation learning has not yet been borne out in audio ML research.
•A high-level description of the variety of domains and tasks that the model will be evaluated on. A particular emphasis will be made on high societal impact audio tasks that are currently underrepresented, such as low-resource languages, environ-mental and ecological safety, clinical speech applications, and ethnomusicology, thus encouraging participants to devise impactful datasets rather than relying solely upon popular and/or commercially viable benchmarks.
https://github.com/neuralaudio/hear2021-eval-kit/pull/18/files#r641180248
Implement evaluation for auto-tagging tasks (multi-label) using LWLRAP -- see see: https://arxiv.org/abs/1906.02975
As spotted by @jorshi
Could have the full sandbox testing API and speed.
Also might be nice to show how to do training in a separate notebook (could be a separate issue)
.version() API method, returning a string, which we can use segregating different output runs of a particular model.
This should also going into the website API description.
This might not actually be necessary if we always are working with numpy embeddings that were cached to disk.
Much easier than mixing it with Luigi
Can we avoid creating a whole new task?
Maybe we include nonmale into the task name?
We might want to slugify the person/instrument and then the filename, for partitioning. Or just have a partition slug, which is the person/instrument
Implement evaluation metric for multi-class evaluation predictions using MRR.
Related to #11
For transcription I suggest we go with Jesse's paper (https://arxiv.org/pdf/1710.11153.pdf)
Explain our versioning: year-major-minor
This will allow us to do regression testing.
The idea of hop_size might be confusing. Another proposal from @maxsolomonhenry is frame_rate
(as number of frames per second).
What any user wants with an audio embedding for downstream use (e.g. for frame based transcription or sed) is that the embedding is based upon the prediction at every particular timestep. However, the input to the embedding might be variable length or use multi-scale centered frames.
The concern was that this distinction might not be clear and hop_size suggests classic overlapp add stuff.
@jorshi "should we pull out all the models from this repo since we have the separate repo for the baseline now?" yeah
Originally posted by @turian in #68 (comment)
That get_audio_embedding
is framing things correctly based upon the hop-size and centering.
We don't want NaN labels for any of the audio, so a filter step should be applied towards the top of the pipeline
If there are multiple requires in the Luigi tasks, use a dict. This is less brittle than numerical indexing.
This will also require changing utils/luigi.py for the stage number.
Given task type, test.csv, and predicted-test.csv output evaluation scores. (Note that we can implement this now just by creating random test.csv and predicted-test.csv files, Christian is starting this task.)
This would be a higher-level convenience.
In this case, the return value would be a numpy.
This code exists in heareval/task_embeddings.py, however we might consider exposing a convenience higher-level API over all embeddings that follow our lower-level API.
Most of the util/luigi.py stuff can be put into pipeline.py
ProcessMetadata pattern should be better documented, and include column headers in the CSV.
Sanity check code for the ProcessMetadata to check the existence of desired columns.
Luigi should create README and LICENSE, as described in the tasks/README.md
Implement evaluation of ranking tasks. Spearman seems like an appropriate metric for these tasks.
I don't think there's a need for this. Timestamps will be the same whether or not the frames are centered.
Originally posted by @maxsolomonhenry in #2 (comment)
This should be run both on the original partitions and also the subsampled partitions
_workdir/{config.TASKNAME}/
Originally posted by @turian in #12 (comment)
Full nsynth is quite large. Should we subsample it? At least for training / val. I think we should keep the full test set.
Originally posted by @jorshi in #78 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.