semantic-health / allennlp-multi-label Goto Github PK
View Code? Open in Web Editor NEWA multi-label classification plugin for AllenNLP.
License: Apache License 2.0
A multi-label classification plugin for AllenNLP.
License: Apache License 2.0
Currently, FBetaMeasureMultiLabel will break if a batch has no gold labels. Make whatever changes are needed to handle this edge case and then write the accompanying unit tests.
Each AllenNLP model has a method make_output_human_readable
which typically is used to add a new field to the output_dict
with human-readable labels. This will need to be re-written for the multi-label setup (likely just by dropping the argmax)
true_negative_sum
is incorrectly computed for the multi-label case. We don't actually use it for anything directly, so this is low-priority, but it should be fixed at some point.
Write extensive unit tests for the MultiLabelTextClassificationJsonReader
. We can likely copy those of TextClassificationJsonReader
from AllenNLP.
Side note, in the future we might consider re-writting TextClassificationJsonReader
to be both multi-label and multi-class. It could just inspect the label
field of the file it was passed to figure this out.
In FBetaMeasureMultiLabel
there is a function that is supposed to detach any tensors involved in the computation of metrics from the computation graph, but this does not work, so we currently detach them manually in the models forward pass. Why is this not working?
Write extensive unit tests for the multi-label metrics. We have to be sure that these work as expected.
You're calculating the F-score using logits but passing a default threshold of 0.5
I'm fairly sure you should be passing probabilities instead or set the threshold to 0.0 - as far as I can see F1MultiLabelMeasure
doesn't convert logits to probabilities, it simply compares the predictions with the threshold.
Hi,
I tried to use pip to install but it did not work.
We are working with conda so using poetry will be problematic for us.
Could you add this project to pypi.org or add a setup.py to install from git?
Thanks,
Naama
Using a bask with FBetaMeasureMultiLabel
is not currently supported. Make whatever modifications are needed to support this, then write the accompanying unit tests.
I get the exception
ValueError: Found a TextField (tokens) with token_indexers already applied, but you're using num_workers > 0 in your data loader. Make sure your dataset reader's text_to_instance() method doesn't add any token_indexers to the TextFields it creates. Instead, the token_indexers should be added to the instances in the apply_token_indexers() method of your dataset reader (which you'll have to implement if you haven't done so already).
In your dataset readers text_to_instance
instead of
fields["tokens"] = TextField(tokens, self._token_indexers)
You should simply use
fields["tokens"] = TextField(tokens)
And override apply_token_indexers
ie
@overrides
def apply_token_indexers(self, instance: Instance) -> None:
if self._segment_sentences:
for text_field in instance.fields["tokens"]: # type: ignore
text_field._token_indexers = self._token_indexers
else:
instance.fields["tokens"]._token_indexers = self._token_indexers # type: ignore
Also the read method should call shard_iterable
ie
for line in self.shard_iterable(data_file.readlines()):
We should have a way to automatically generate weights for the BinaryCrossEntropy loss function when reading in the data:
Weight should be:
weight = (num_datapoints - num_occurences_label) / num_occurences_label
for the labels with count > 0 and num_datapoints + 1 otherwise.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.