The gerbil's discuss from dice-group

[dataset] Add NIF File upload

Add the possibility that a NIF file can be uploaded, parsed and used as dataset.

Checkbox for no logging of annotators and confirmative with disclaimer

Built in a checkbox on the configuration screen that stops the experiment from running if the disclaimer has not been read and understood.

Refactor layout of the configuration screen

Refactor frontend

GOAL: have a short description/citation per dataset, annotator, matching, experiment type (maybe “Kachel”-Layout)

SUBGOAL: provide provenance information here

[dataset] Ritter dataset wrapper

Write a wrapper for the Ritter dataset dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets

[Experiment] Implement Annotation experiment

Have a look at the current literature for this experiment type and implement it in the GERBIL interface.

[Experiment] Implement Typing experiment

Have a look at the current literature for this experiment type and implement it in the GERBIL interface.

[annotator] Add a dexter wrapper

Add a Dexter http://dexter.isti.cnr.it/ wrapper and make sure the annotator API is openly available since we want only open software.

Adapt Babelfy URI according to dataset

Babelfy consists of the following two EL setups we are:

short and highly ambiguous text like kore50 (Matching.PARTIAL)
coherent and long documents like aida-conll (Matching.EXACT)

Rrunning both versions on all the datasets should be the right way to go for now.

For what concerns the parameter to set, if you are using the java interface you should change the MATCHING enum otherwise you should add to the url the following parameter:

&partMatching=true
&partMatching=false

The first one will enable partial matching (i.e., for kore50, twitter), while the second will enable exact matching (i.e., for news, long documents, wsd).

GOAL: Adapt source code to two Babelfy Adapters and give credits to the used version.

Each dataset should have Language as a feature

Implement language as a feature of a dataset so we can move forward towards multilingual benchmarking of annotators.

Offer experiment results in a sparql endpoint

GOAL: As a user I want to use http://somurl/gerbil/sparql to retrieve for example the best annotator for microposts in a a2w task.

Therefore implement a SPARQL interface within gerbil so the sparql endpoint is bound to the gerbil webapp

Add an overview table containing only the most actual results

One of our "costumers" said that it might be helpful if there would be a large table for every experiment type containing all results of all (open) annotator/dataset combinations.

License/add boilerplate

GERBIL license?

TODO: add a boilerplate license template to all src code files with a common license that will be then filled up according to the authors, affiliations, etc etc.

[dataset] Microposts2013 wrapper

Write a wrapper for the Microposts2013 dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets

Description how to add an annotator / dataset

Create wiki pages that describe the addition of annotators and datasets.

Survey of effort implementing a GERBIL wrapper

Start a survey about the effort it takes to implement a GERBIL annotator/dataset wrapper to proof the order of magnitude more effective and efficient evaluation via GERBIL than solo.

Got an error while running the task: C2W/Spotlight/KORE50

Type: C2W
Annotator: DBpedia Spotlight
Dataset: KORE50

Testing with tagger: DBpedia Spotlight (Default) dataset: KORE50 (no score thr.)
2014-11-03 00:01:04,025 ERROR [org.aksw.gerbil.execute.ExperimentTaskExecuter] -
GerbilException: java.lang.ClassCastException: it.acubelab.batframework.data.Tag cannot be cast to it.acubelab.batframework.data.Annotation (error type -106: Got an unexpected exception while running the experiment.)
at org.aksw.gerbil.execute.ExperimentTaskExecuter.runExperiment(ExperimentTaskExecuter.java:167)
at org.aksw.gerbil.execute.ExperimentTaskExecuter.run(ExperimentTaskExecuter.java:82)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassCastException: it.acubelab.batframework.data.Tag cannot be cast to it.acubelab.batframework.data.Annotation
at it.acubelab.batframework.metrics.ConceptAnnotationMatch.preProcessOutput(ConceptAnnotationMatch.java:51)
at it.acubelab.batframework.metrics.Metrics.getResult(Metrics.java:18)
at it.acubelab.batframework.utils.RunExperiments.computeMetricsC2W(RunExperiments.java:162)
at it.acubelab.batframework.utils.RunExperiments.performC2WExpVarThreshold(RunExperiments.java:263)
at org.aksw.gerbil.execute.ExperimentTaskExecuter.runExperiment(ExperimentTaskExecuter.java:164)
... 2 more

Wrong filepath for datahub files

I load the filepath via the GerbilConfiguration.
The command is the following:

 private static final String NIF_DATASET_FILE_PROPERTY_NAME = "org.aksw.gerbil.datasets.Datahub";
 GerbilConfiguration.getInstance().getString(NIF_DATASET_FILE_PROPERTY_NAME) + datasetName;

In the gerbil.properties the property name is defined as the following

org.aksw.gerbil.datasets.Datahub=${org.aksw.gerbil.DataPath}/datasets/datahub/

but the file will be in the gerbil folder and look like the following:

nullbrown-corpus-in-rdf-nif

Maybe I did something wrong with the loading of the path?

New KORE50 trouble with NIF?

The "corrected" version of the KORE50 dataset in NIF contains only 1 context which makes the 50 test sentences to one document. Is this use of context correct?

http://www.yovisto.com/labs/ner-benchmarks/

Please talk to Jörg Waitelonis about that.

Cube Vocabulary to describe result tables

Built a mechanism to output cube vocabulary as RDF for results in order to have a sparql-able results page.

Transform wikipedia id central view point to URI central view point

The idea of this important step is to become knowledge base agnostic. Most datasets are annotated to some form of Wikipedia or DBpedia. Currently, we only support annotation to these systems.

GOAL: transform each method to understand URIs. Annotate each corpus with its annotation knowledge base and each annotator with his capable annotations.

Add TAC-KBP Experiments

Add experiment types from http://nlp.cs.rpi.edu/kbp/2014/KBP2014EL_V0.2.pdf

[Experiment] Implement Salience experiment

Have a look at the current literature for this experiment type and implement it in the GERBIL interface.

[dataset] Implement a wrapper for senseeval datasets

@Andrea Moro: Add a Senseeval wrapper and make sure the dataset is openly available and the licence is ensured since we want only open datasets in GERBIL.

[dataset] Microposts2014 wrapper

Write a wrapper for the Microposts2014 dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets

Startup fails

Hi,
mvn clean tomcat:run -Dmaven.tomcat.port=1234 fails due to static paths pointing to /data/m.roeder/workspace/ ...
perhaps better to cope with relative paths? Or either adding a variable in the pom.xml?

Implement a spider diagram beneath each experiment

To enable diagnostics provide a spider diagram from D3JS underneath each experiment in the overview tab.

Upload the needed data

Upload the needed corpora and configuration files and document the URL of it.

Caching of results only if they don't contain an error code

The result caching mechanism should check wether a result contains an error code.

[dataset] WEKEX'11 wrapper

Write a wrapper for the WEKEX'11 dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets

[dataset] WSDM2012 wrapper

Write a wrapper for the WSDM2012 dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets

Error while exeperimenting with agdistis

I get an error after I try to run an Experiment with Agdistis:

2014-11-01 19:04:22,544 ERROR [org.aksw.gerbil.execute.ExperimentTaskExecuter] - <Error while trying to execute experiment.>
java.lang.ClassCastException: org.aksw.gerbil.bat.annotator.ErrorCountingAnnotatorDecorator$ErrorCountingD2W cannot be cast to it.acubelab.batframework.problems.Sa2WSystem
    at org.aksw.gerbil.execute.ExperimentTaskExecuter.runExperiment(ExperimentTaskExecuter.java:144)
    at org.aksw.gerbil.execute.ExperimentTaskExecuter.run(ExperimentTaskExecuter.java:82)
    at java.lang.Thread.run(Thread.java:745)

I selected the sa2w experiment type, agdistis and one of the new corpura (from datahub).
When using DBpedia Spotlight, the same experiment type and corpura it worked.

Implement wrapper for NERD annotator

Add a NERD wrapper and make sure the annotator API is openly available since we want only open software.

[dataset] UMBC dataset wrapper

Write a wrapper for the UMBC dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets

Document how to setup and run GERBIL on localhost in the wiki

Write a docu about the startup procedure in the wiki

topical tags for NIF documents

Find a solution to add topical tags to a NIF document (needed for C2W, Rc2W and Sc2W)

[Experiment] Implement Word Sense Disambiguation experiment

Have a look at the current literature for this experiment type and implement it in the GERBIL interface. Especially, look at Babelfy.

Ensure validity of datahub io NIF corpora

When we load automatically corpora from Datahub.io it is not clear whether they are usable for certain experiments, e.g., http://datahub.io/dataset/brown-corpus-in-rdf-nif

GOAL: Implement tests to ensure that e.g. annotations, its:ref etc is there.

Replace *Mapping classes

I would like to replace the *Mapping.java (org.aksw.gerbil.utils) with CSV Files. This file could than be loaded into a database.
So to add a Annotator or Datasets it should be more easy.
Also a more dynamic way for loading Annotators/Datasets via the frontend should than be possible. The added Annotators/Datasets could than be stored into a database and in the CSV Files.

Datahub-IO download of NIF datasets

Add the usage of NIF datasets from datahub IO.

Implement wrapper for HPI annotator

@jörg Waitelonis: Add HPI annotator and give us your github account name to add you to the project.

[dataset] CoNLL2003 wrapper

Write a wrapper for the CoNLL2003 dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets

Benchmark corpora for typing and linking

For typing:

CoNLL2003 (EN, newswire, NIST license)
Microposts2013 (EN, microposts, license not given, http://oak.dcs.shef.ac.uk/msm2013/ie_challenge/msm2013-concept_extraction-data.zip)

For linking

Microposts2014 (EN, microposts, Twitter license, provided ids http://www.scc.lancs.ac.uk/microposts2014/challenge/dataset/microposts2014-neel_challenge_gs.zip)

For typing and linking, together in one dataset:

WEKEX'11 (EN, newswire, CC license http://nerd.eurecom.fr/ui/evaluation/wekex2011-goldenset.tar.gz ). The dataset has been further polished and harmonized in terms of links (Wikipedia resources) by http://ner.vse.cz/datasets/evaluation/contact/. The new snapshot is available at http://ner.vse.cz/datasets/evaluation/benchmark-datasets/News-dataset-v1.zip .

Depending on the definition of open in this context (see #16), we may consider using those corpora

"Wiki title catcher) of url http://dbpedia.org/resource/Bill_Morgan_(outfielder/catcher) (decoded http://dbpedia.org/resource/Bill_Morgan_(outfielder/catcher)) could not be found."

Initialization should set running experiments back

While initialization the system should check the db for running experiment tasks and set them to an error code.

dice-group / gerbil Goto Github PK

gerbil's Issues

Recommend Projects

Recommend Topics

Recommend Org