Coder Social home page Coder Social logo

dice-group / gerbil Goto Github PK

View Code? Open in Web Editor NEW
218.0 28.0 57.0 122.25 MB

GERBIL - General Entity annotatoR Benchmark

License: GNU Affero General Public License v3.0

Java 96.56% JavaScript 2.09% HTML 0.90% Shell 0.42% Dockerfile 0.03%
benchmarking benchmarking-framework entity-annotation entity-linking named-entity-recognition

gerbil's Introduction

GERBIL

General Entity Annotator Benchmark

Java CI with Maven Project Stats Codacy Badge

General Information

This project is a benchmarking platform for entity annotation and disambiguation tools. It also has been extended for Question Answering (see QuestionAnswering branch).

For further information, please take a look into the wiki or visit the project home page.

How to cite

If you are using GERBIL for your research, we would be happy to be cited.

For the Knowledge Extraction GERBIL, please cite the journal paper "GERBIL–Benchmarking Named Entity Recognition and Linking Consistently"

For the Question Answering GERBIL, please cite the journal paper "Benchmarking Question Answering Systems"

Other papers, like the WWW paper can be found at: http://aksw.org/Projects/GERBIL.html

gerbil's People

Contributors

arjanoop avatar cirola2000 avatar dcherix avatar der-bruemmer avatar firmao avatar giusepperizzo avatar isspek avatar jlleitschuh avatar jonathanhuthmann avatar larswese avatar lithiumh avatar lukasbluebaum avatar michaelroeder avatar nikit91 avatar nopper avatar philippkuntschik avatar renespeck avatar ricardousbeck avatar shatu avatar tgalery avatar vhf avatar wetneb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gerbil's Issues

Adapt Babelfy URI according to dataset

Babelfy consists of the following two EL setups we are:

  1. short and highly ambiguous text like kore50 (Matching.PARTIAL)
  2. coherent and long documents like aida-conll (Matching.EXACT)

Rrunning both versions on all the datasets should be the right way to go for now.

For what concerns the parameter to set, if you are using the java interface you should change the MATCHING enum otherwise you should add to the url the following parameter:

&partMatching=true
&partMatching=false

The first one will enable partial matching (i.e., for kore50, twitter), while the second will enable exact matching (i.e., for news, long documents, wsd).

GOAL: Adapt source code to two Babelfy Adapters and give credits to the used version.

Benchmark corpora for typing and linking

For typing:

For linking

For typing and linking, together in one dataset:

Depending on the definition of open in this context (see #16), we may consider using those corpora

Replace *Mapping classes

I would like to replace the *Mapping.java (org.aksw.gerbil.utils) with CSV Files. This file could than be loaded into a database.
So to add a Annotator or Datasets it should be more easy.
Also a more dynamic way for loading Annotators/Datasets via the frontend should than be possible. The added Annotators/Datasets could than be stored into a database and in the CSV Files.

Got an error while running the task: C2W/Spotlight/KORE50

Type: C2W
Annotator: DBpedia Spotlight
Dataset: KORE50

Testing with tagger: DBpedia Spotlight (Default) dataset: KORE50 (no score thr.)
2014-11-03 00:01:04,025 ERROR [org.aksw.gerbil.execute.ExperimentTaskExecuter] -
GerbilException: java.lang.ClassCastException: it.acubelab.batframework.data.Tag cannot be cast to it.acubelab.batframework.data.Annotation (error type -106: Got an unexpected exception while running the experiment.)
at org.aksw.gerbil.execute.ExperimentTaskExecuter.runExperiment(ExperimentTaskExecuter.java:167)
at org.aksw.gerbil.execute.ExperimentTaskExecuter.run(ExperimentTaskExecuter.java:82)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassCastException: it.acubelab.batframework.data.Tag cannot be cast to it.acubelab.batframework.data.Annotation
at it.acubelab.batframework.metrics.ConceptAnnotationMatch.preProcessOutput(ConceptAnnotationMatch.java:51)
at it.acubelab.batframework.metrics.Metrics.getResult(Metrics.java:18)
at it.acubelab.batframework.utils.RunExperiments.computeMetricsC2W(RunExperiments.java:162)
at it.acubelab.batframework.utils.RunExperiments.performC2WExpVarThreshold(RunExperiments.java:263)
at org.aksw.gerbil.execute.ExperimentTaskExecuter.runExperiment(ExperimentTaskExecuter.java:164)
... 2 more

Survey of effort implementing a GERBIL wrapper

Start a survey about the effort it takes to implement a GERBIL annotator/dataset wrapper to proof the order of magnitude more effective and efficient evaluation via GERBIL than solo.

Transform wikipedia id central view point to URI central view point

The idea of this important step is to become knowledge base agnostic. Most datasets are annotated to some form of Wikipedia or DBpedia. Currently, we only support annotation to these systems.

GOAL: transform each method to understand URIs. Annotate each corpus with its annotation knowledge base and each annotator with his capable annotations.

Refactor layout of the configuration screen

Refactor frontend

GOAL: have a short description/citation per dataset, annotator, matching, experiment type (maybe “Kachel”-Layout)

SUBGOAL: provide provenance information here

Error while exeperimenting with agdistis

I get an error after I try to run an Experiment with Agdistis:

2014-11-01 19:04:22,544 ERROR [org.aksw.gerbil.execute.ExperimentTaskExecuter] - <Error while trying to execute experiment.>
java.lang.ClassCastException: org.aksw.gerbil.bat.annotator.ErrorCountingAnnotatorDecorator$ErrorCountingD2W cannot be cast to it.acubelab.batframework.problems.Sa2WSystem
    at org.aksw.gerbil.execute.ExperimentTaskExecuter.runExperiment(ExperimentTaskExecuter.java:144)
    at org.aksw.gerbil.execute.ExperimentTaskExecuter.run(ExperimentTaskExecuter.java:82)
    at java.lang.Thread.run(Thread.java:745)

I selected the sa2w experiment type, agdistis and one of the new corpura (from datahub).
When using DBpedia Spotlight, the same experiment type and corpura it worked.

Startup fails

Hi,
mvn clean tomcat:run -Dmaven.tomcat.port=1234 fails due to static paths pointing to /data/m.roeder/workspace/ ...
perhaps better to cope with relative paths? Or either adding a variable in the pom.xml?

License/add boilerplate

GERBIL license?

TODO: add a boilerplate license template to all src code files with a common license that will be then filled up according to the authors, affiliations, etc etc.

Fix URL encoding

if the title of an entity contains a "/" the system is not able to get its wikipedia ID. It seems that somewhere inside the URI is decoded which leads to the unescaped "/" inside the entity name.

from the log: "Wiki title minor_compositions of url http://dbpedia.org/resource/List_of_major/minor_compositions (decoded http://dbpedia.org/resource/List_of_major/minor_compositions) could not be found."

"Wiki title catcher) of url http://dbpedia.org/resource/Bill_Morgan_(outfielder/catcher) (decoded http://dbpedia.org/resource/Bill_Morgan_(outfielder/catcher)) could not be found."

Wrong filepath for datahub files

I load the filepath via the GerbilConfiguration.
The command is the following:

 private static final String NIF_DATASET_FILE_PROPERTY_NAME = "org.aksw.gerbil.datasets.Datahub";
 GerbilConfiguration.getInstance().getString(NIF_DATASET_FILE_PROPERTY_NAME) + datasetName;

In the gerbil.properties the property name is defined as the following

org.aksw.gerbil.datasets.Datahub=${org.aksw.gerbil.DataPath}/datasets/datahub/

but the file will be in the gerbil folder and look like the following:

nullbrown-corpus-in-rdf-nif

Maybe I did something wrong with the loading of the path?

RDF/DataId

Add additional RDFa data to the result page containing DataId information for the datasets and/or adapters.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.