Coder Social home page Coder Social logo

dice-group / gerbil Goto Github PK

View Code? Open in Web Editor NEW
218.0 28.0 57.0 122.25 MB

GERBIL - General Entity annotatoR Benchmark

License: GNU Affero General Public License v3.0

Java 96.56% JavaScript 2.09% HTML 0.90% Shell 0.42% Dockerfile 0.03%
benchmarking benchmarking-framework entity-annotation entity-linking named-entity-recognition

gerbil's Issues

Refactor layout of the configuration screen

Refactor frontend

GOAL: have a short description/citation per dataset, annotator, matching, experiment type (maybe “Kachel”-Layout)

SUBGOAL: provide provenance information here

Adapt Babelfy URI according to dataset

Babelfy consists of the following two EL setups we are:

  1. short and highly ambiguous text like kore50 (Matching.PARTIAL)
  2. coherent and long documents like aida-conll (Matching.EXACT)

Rrunning both versions on all the datasets should be the right way to go for now.

For what concerns the parameter to set, if you are using the java interface you should change the MATCHING enum otherwise you should add to the url the following parameter:

&partMatching=true
&partMatching=false

The first one will enable partial matching (i.e., for kore50, twitter), while the second will enable exact matching (i.e., for news, long documents, wsd).

GOAL: Adapt source code to two Babelfy Adapters and give credits to the used version.

License/add boilerplate

GERBIL license?

TODO: add a boilerplate license template to all src code files with a common license that will be then filled up according to the authors, affiliations, etc etc.

Survey of effort implementing a GERBIL wrapper

Start a survey about the effort it takes to implement a GERBIL annotator/dataset wrapper to proof the order of magnitude more effective and efficient evaluation via GERBIL than solo.

Got an error while running the task: C2W/Spotlight/KORE50

Type: C2W
Annotator: DBpedia Spotlight
Dataset: KORE50

Testing with tagger: DBpedia Spotlight (Default) dataset: KORE50 (no score thr.)
2014-11-03 00:01:04,025 ERROR [org.aksw.gerbil.execute.ExperimentTaskExecuter] -
GerbilException: java.lang.ClassCastException: it.acubelab.batframework.data.Tag cannot be cast to it.acubelab.batframework.data.Annotation (error type -106: Got an unexpected exception while running the experiment.)
at org.aksw.gerbil.execute.ExperimentTaskExecuter.runExperiment(ExperimentTaskExecuter.java:167)
at org.aksw.gerbil.execute.ExperimentTaskExecuter.run(ExperimentTaskExecuter.java:82)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassCastException: it.acubelab.batframework.data.Tag cannot be cast to it.acubelab.batframework.data.Annotation
at it.acubelab.batframework.metrics.ConceptAnnotationMatch.preProcessOutput(ConceptAnnotationMatch.java:51)
at it.acubelab.batframework.metrics.Metrics.getResult(Metrics.java:18)
at it.acubelab.batframework.utils.RunExperiments.computeMetricsC2W(RunExperiments.java:162)
at it.acubelab.batframework.utils.RunExperiments.performC2WExpVarThreshold(RunExperiments.java:263)
at org.aksw.gerbil.execute.ExperimentTaskExecuter.runExperiment(ExperimentTaskExecuter.java:164)
... 2 more

Wrong filepath for datahub files

I load the filepath via the GerbilConfiguration.
The command is the following:

 private static final String NIF_DATASET_FILE_PROPERTY_NAME = "org.aksw.gerbil.datasets.Datahub";
 GerbilConfiguration.getInstance().getString(NIF_DATASET_FILE_PROPERTY_NAME) + datasetName;

In the gerbil.properties the property name is defined as the following

org.aksw.gerbil.datasets.Datahub=${org.aksw.gerbil.DataPath}/datasets/datahub/

but the file will be in the gerbil folder and look like the following:

nullbrown-corpus-in-rdf-nif

Maybe I did something wrong with the loading of the path?

Transform wikipedia id central view point to URI central view point

The idea of this important step is to become knowledge base agnostic. Most datasets are annotated to some form of Wikipedia or DBpedia. Currently, we only support annotation to these systems.

GOAL: transform each method to understand URIs. Annotate each corpus with its annotation knowledge base and each annotator with his capable annotations.

Startup fails

Hi,
mvn clean tomcat:run -Dmaven.tomcat.port=1234 fails due to static paths pointing to /data/m.roeder/workspace/ ...
perhaps better to cope with relative paths? Or either adding a variable in the pom.xml?

Error while exeperimenting with agdistis

I get an error after I try to run an Experiment with Agdistis:

2014-11-01 19:04:22,544 ERROR [org.aksw.gerbil.execute.ExperimentTaskExecuter] - <Error while trying to execute experiment.>
java.lang.ClassCastException: org.aksw.gerbil.bat.annotator.ErrorCountingAnnotatorDecorator$ErrorCountingD2W cannot be cast to it.acubelab.batframework.problems.Sa2WSystem
    at org.aksw.gerbil.execute.ExperimentTaskExecuter.runExperiment(ExperimentTaskExecuter.java:144)
    at org.aksw.gerbil.execute.ExperimentTaskExecuter.run(ExperimentTaskExecuter.java:82)
    at java.lang.Thread.run(Thread.java:745)

I selected the sa2w experiment type, agdistis and one of the new corpura (from datahub).
When using DBpedia Spotlight, the same experiment type and corpura it worked.

Replace *Mapping classes

I would like to replace the *Mapping.java (org.aksw.gerbil.utils) with CSV Files. This file could than be loaded into a database.
So to add a Annotator or Datasets it should be more easy.
Also a more dynamic way for loading Annotators/Datasets via the frontend should than be possible. The added Annotators/Datasets could than be stored into a database and in the CSV Files.

Benchmark corpora for typing and linking

For typing:

For linking

For typing and linking, together in one dataset:

Depending on the definition of open in this context (see #16), we may consider using those corpora

RDF/DataId

Add additional RDFa data to the result page containing DataId information for the datasets and/or adapters.

Fix URL encoding

if the title of an entity contains a "/" the system is not able to get its wikipedia ID. It seems that somewhere inside the URI is decoded which leads to the unescaped "/" inside the entity name.

from the log: "Wiki title minor_compositions of url http://dbpedia.org/resource/List_of_major/minor_compositions (decoded http://dbpedia.org/resource/List_of_major/minor_compositions) could not be found."

"Wiki title catcher) of url http://dbpedia.org/resource/Bill_Morgan_(outfielder/catcher) (decoded http://dbpedia.org/resource/Bill_Morgan_(outfielder/catcher)) could not be found."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.