dice-group / gerbil Goto Github PK
View Code? Open in Web Editor NEWGERBIL - General Entity annotatoR Benchmark
License: GNU Affero General Public License v3.0
GERBIL - General Entity annotatoR Benchmark
License: GNU Affero General Public License v3.0
Add the possibility that a NIF file can be uploaded, parsed and used as dataset.
Built in a checkbox on the configuration screen that stops the experiment from running if the disclaimer has not been read and understood.
Refactor frontend
GOAL: have a short description/citation per dataset, annotator, matching, experiment type (maybe “Kachel”-Layout)
SUBGOAL: provide provenance information here
Write a wrapper for the Ritter dataset dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets
Have a look at the current literature for this experiment type and implement it in the GERBIL interface.
Have a look at the current literature for this experiment type and implement it in the GERBIL interface.
Add a Dexter http://dexter.isti.cnr.it/ wrapper and make sure the annotator API is openly available since we want only open software.
Babelfy consists of the following two EL setups we are:
Rrunning both versions on all the datasets should be the right way to go for now.
For what concerns the parameter to set, if you are using the java interface you should change the MATCHING enum otherwise you should add to the url the following parameter:
&partMatching=true
&partMatching=false
The first one will enable partial matching (i.e., for kore50, twitter), while the second will enable exact matching (i.e., for news, long documents, wsd).
GOAL: Adapt source code to two Babelfy Adapters and give credits to the used version.
Implement language as a feature of a dataset so we can move forward towards multilingual benchmarking of annotators.
GOAL: As a user I want to use http://somurl/gerbil/sparql to retrieve for example the best annotator for microposts in a a2w task.
Therefore implement a SPARQL interface within gerbil so the sparql endpoint is bound to the gerbil webapp
One of our "costumers" said that it might be helpful if there would be a large table for every experiment type containing all results of all (open) annotator/dataset combinations.
GERBIL license?
TODO: add a boilerplate license template to all src code files with a common license that will be then filled up according to the authors, affiliations, etc etc.
Write a wrapper for the Microposts2013 dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets
Create wiki pages that describe the addition of annotators and datasets.
Start a survey about the effort it takes to implement a GERBIL annotator/dataset wrapper to proof the order of magnitude more effective and efficient evaluation via GERBIL than solo.
Type: C2W
Annotator: DBpedia Spotlight
Dataset: KORE50
Testing with tagger: DBpedia Spotlight (Default) dataset: KORE50 (no score thr.)
2014-11-03 00:01:04,025 ERROR [org.aksw.gerbil.execute.ExperimentTaskExecuter] -
GerbilException: java.lang.ClassCastException: it.acubelab.batframework.data.Tag cannot be cast to it.acubelab.batframework.data.Annotation (error type -106: Got an unexpected exception while running the experiment.)
at org.aksw.gerbil.execute.ExperimentTaskExecuter.runExperiment(ExperimentTaskExecuter.java:167)
at org.aksw.gerbil.execute.ExperimentTaskExecuter.run(ExperimentTaskExecuter.java:82)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassCastException: it.acubelab.batframework.data.Tag cannot be cast to it.acubelab.batframework.data.Annotation
at it.acubelab.batframework.metrics.ConceptAnnotationMatch.preProcessOutput(ConceptAnnotationMatch.java:51)
at it.acubelab.batframework.metrics.Metrics.getResult(Metrics.java:18)
at it.acubelab.batframework.utils.RunExperiments.computeMetricsC2W(RunExperiments.java:162)
at it.acubelab.batframework.utils.RunExperiments.performC2WExpVarThreshold(RunExperiments.java:263)
at org.aksw.gerbil.execute.ExperimentTaskExecuter.runExperiment(ExperimentTaskExecuter.java:164)
... 2 more
I load the filepath via the GerbilConfiguration.
The command is the following:
private static final String NIF_DATASET_FILE_PROPERTY_NAME = "org.aksw.gerbil.datasets.Datahub";
GerbilConfiguration.getInstance().getString(NIF_DATASET_FILE_PROPERTY_NAME) + datasetName;
In the gerbil.properties the property name is defined as the following
org.aksw.gerbil.datasets.Datahub=${org.aksw.gerbil.DataPath}/datasets/datahub/
but the file will be in the gerbil folder and look like the following:
nullbrown-corpus-in-rdf-nif
Maybe I did something wrong with the loading of the path?
The "corrected" version of the KORE50 dataset in NIF contains only 1 context which makes the 50 test sentences to one document. Is this use of context correct?
http://www.yovisto.com/labs/ner-benchmarks/
Please talk to Jörg Waitelonis about that.
Built a mechanism to output cube vocabulary as RDF for results in order to have a sparql-able results page.
The idea of this important step is to become knowledge base agnostic. Most datasets are annotated to some form of Wikipedia or DBpedia. Currently, we only support annotation to these systems.
GOAL: transform each method to understand URIs. Annotate each corpus with its annotation knowledge base and each annotator with his capable annotations.
Add experiment types from http://nlp.cs.rpi.edu/kbp/2014/KBP2014EL_V0.2.pdf
Have a look at the current literature for this experiment type and implement it in the GERBIL interface.
@Andrea Moro: Add a Senseeval wrapper and make sure the dataset is openly available and the licence is ensured since we want only open datasets in GERBIL.
Write a wrapper for the Microposts2014 dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets
Hi,
mvn clean tomcat:run -Dmaven.tomcat.port=1234 fails due to static paths pointing to /data/m.roeder/workspace/ ...
perhaps better to cope with relative paths? Or either adding a variable in the pom.xml?
To enable diagnostics provide a spider diagram from D3JS underneath each experiment in the overview tab.
Upload the needed corpora and configuration files and document the URL of it.
The result caching mechanism should check wether a result contains an error code.
Write a wrapper for the WEKEX'11 dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets
Write a wrapper for the WSDM2012 dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets
I get an error after I try to run an Experiment with Agdistis:
2014-11-01 19:04:22,544 ERROR [org.aksw.gerbil.execute.ExperimentTaskExecuter] - <Error while trying to execute experiment.>
java.lang.ClassCastException: org.aksw.gerbil.bat.annotator.ErrorCountingAnnotatorDecorator$ErrorCountingD2W cannot be cast to it.acubelab.batframework.problems.Sa2WSystem
at org.aksw.gerbil.execute.ExperimentTaskExecuter.runExperiment(ExperimentTaskExecuter.java:144)
at org.aksw.gerbil.execute.ExperimentTaskExecuter.run(ExperimentTaskExecuter.java:82)
at java.lang.Thread.run(Thread.java:745)
I selected the sa2w experiment type, agdistis and one of the new corpura (from datahub).
When using DBpedia Spotlight, the same experiment type and corpura it worked.
Add a NERD wrapper and make sure the annotator API is openly available since we want only open software.
Write a wrapper for the UMBC dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets
Write a docu about the startup procedure in the wiki
Find a solution to add topical tags to a NIF document (needed for C2W, Rc2W and Sc2W)
Have a look at the current literature for this experiment type and implement it in the GERBIL interface. Especially, look at Babelfy.
When we load automatically corpora from Datahub.io it is not clear whether they are usable for certain experiments, e.g., http://datahub.io/dataset/brown-corpus-in-rdf-nif
GOAL: Implement tests to ensure that e.g. annotations, its:ref etc is there.
I would like to replace the *Mapping.java (org.aksw.gerbil.utils) with CSV Files. This file could than be loaded into a database.
So to add a Annotator or Datasets it should be more easy.
Also a more dynamic way for loading Annotators/Datasets via the frontend should than be possible. The added Annotators/Datasets could than be stored into a database and in the CSV Files.
Add the usage of NIF datasets from datahub IO.
@jörg Waitelonis: Add HPI annotator and give us your github account name to add you to the project.
Write a wrapper for the CoNLL2003 dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets
For typing:
For linking
For typing and linking, together in one dataset:
Depending on the definition of open in this context (see #16), we may consider using those corpora
Make the NIF dataset adapter more robust. It should be able to handle NIF documents even if they contain additional markups (like sentences).
Please overhaul and add to the wiki page all discussed considerations https://github.com/AKSW/gerbil/wiki/Disclaimer-and-Licensing-of-Datasets
Write a wrapper for the Derczynski dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets
Next to Micro and Macro F1-measure there are other interesting metrics which could be added. GERBIL should have an infrastructer which makes this possible.
Workaround cleaning db?
How?
Add additional RDFa data to the result page containing DataId information for the datasets and/or adapters.
if the title of an entity contains a "/" the system is not able to get its wikipedia ID. It seems that somewhere inside the URI is decoded which leads to the unescaped "/" inside the entity name.
from the log: "Wiki title minor_compositions of url http://dbpedia.org/resource/List_of_major/minor_compositions (decoded http://dbpedia.org/resource/List_of_major/minor_compositions) could not be found."
"Wiki title catcher) of url http://dbpedia.org/resource/Bill_Morgan_(outfielder/catcher) (decoded http://dbpedia.org/resource/Bill_Morgan_(outfielder/catcher)) could not be found."
While initialization the system should check the db for running experiment tasks and set them to an error code.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.