Coder Social home page Coder Social logo

agnos's Introduction

kg-agnostic-entity-linking


Licenses

Sub-Project Name License Info JAR/Source/Dependency Location
RBBNPE (NPComplete) Copyright GPLv3, Laurenz Vorderwoelbecke & Michael Färber lib/npcomplete.jar;
bmw.annprocessor.executable.NPExtractionManager;
bmw.annprocessor.executable.NPExtractionManagerTest
PageRankRDF MIT License eu.wdaqua
java-LSH MIT License LSHMinhash
RDF2Vec / de.dwslab.petar

Guide(s)

Please note that the execution of Agnos creates the appropriate file/folder tree structure for each expected knowledge graph (as defined in EnumModelType).

(Maven) Setup of virt-jena/jdbc drivers (dependency required for Virtuoso execution) on a local repository (due to unavailability on public repositories)

Install Jena's Virtuoso Driver
https://stackoverflow.com/questions/41137342/is-there-any-usable-dependency-for-virtuoso-jena-driver
Maven
https://maven.apache.org/download.cgi
Maven instructions
https://www.mkyong.com/maven/how-to-install-maven-in-windows/
For the Virtuoso maven jar generation
Do this (add quotes compared to the solution proposed, as it otherwise complains with some errors)
Explained why here: https://stackoverflow.com/questions/16348459/error-the-goal-you-specified-requires-a-project-to-execute-but-there-is-no-pom
Commands
mvn install:install-file -q "-Dfile=./virt_jena3.jar" "-DgroupId=com.openlink.virtuoso" "-DartifactId=virt_jena3" "-Dversion=3.0" "-Dpackaging=jar" "-DgeneratePom=true"
mvn install:install-file -q "-Dfile=./virtjdbc4.jar" "-DgroupId=com.openlink.virtuoso" "-DartifactId=virtjdbc4" "-Dversion=4.0" "-Dpackaging=jar" "-DgeneratePom=true"

Setting up for a new KG

Introduce it as an additional enumeration within EnumModelType and define a SPARQL query (done in EntityQuery.java) for it to fetch its entities.

Set up precomputation structures

1) Importing KG, setting up mention detection & pagerank, then computing embeddings
1. Import KG into Jena (local) or Virtuoso (possible remote connection through JDBC driver) by executing LauncherSetupTDB, specifying the wanted EnumModelType (aka. KG) and input location in order to set up the TDB for Jena usage. If the KG is accessed via the Virtuoso endpoint/JDBC driver (and is already loaded within it), this step is not necessary.
2. Execute LauncherExecuteQueries, specifying the wanted EnumModelType (aka. KG) in order to execute all wanted surface form, helping surface form etc. queries on the KG - required for mention detection and for deciding which embeddings are required.
3. Execute LauncherSetup, specifying the wanted EnumModelType (aka. KG) - this will execute the mention detection setup, as well as compute apriori PageRank values for the KG file located as defined in FilePaths.FILE_PAGERANK. Note that a local version of the KG needs to be present (even if using Virtuoso) in order to compute PageRank on it.
4. Compute Graph Walks (Java) with wanted arguments in LauncherWalkGenerator and then executing it, outputting walks to ./[KG]/resources/data/walks.txt.
5. Specify location of output graph walks in ./sentencesPaths.txt (can be split among multiple files)
6. Specify hyperparameters and compute Embeddings (Python) by executing ./scripts/trainModel.py
7. Place/Move embeddings appropriately into the KG's file tree structure as defined in FilePaths.FILE_EMBEDDINGS_GRAPH_WALK_ENTITY_EMBEDDINGS (or change the value to match the embeddings output location).
2) (Optional) Mention Detection Tuning (LSH)
0. Execute LauncherMentionDetectionTuning, defining a sample input as well as potentially different bins and bands dimensions in order to tune LSH arguments for execution times (not inclusiveness/exclusiveness/quality of results). Then adapt the arguments LSH_BANDS and LSH_BUCKETS in Numbers.java appropriately.
3) Done. Ready to apply entity linking! (E.g. by executing LauncherContinuousMentionDetector or using it as a NIF API type of annotator with calls to GERBILAnnotator)

Package Details

*.debug
Classes used for debugging / analysis
*.deprecated
Classes that are deprecated for the current version, but might make a comeback depending on direction and progress of research.
alu.linking
Main Project Folder
alu.linking.api
Classes related to API calls (currently only NIF API for GERBIL)
alu.linking.candidategeneration
Classes related to candidate generation
alu.linking.config
Configuration-related classes and packages
alu.linking.config.constants
Configuration-related runtime constants (e.g. file locations, knowledge graphs, server connections, ...)
alu.linking.config.kg
Knowledge graph-related constants (e.g. supported KGs and related entity queries)
alu.linking.disambiguation
Disambiguation-related classes and packages
alu.linking.disambiguation.hops
Classes related to graph-hopping through KGs
alu.linking.disambiguation.hops.graph
In-memory graph related classes
alu.linking.disambiguation.hops.pathbuilding
Classes used for crawling paths within KGs (for 'hopping')
alu.linking.disambiguation.pagerank
PageRank-related classes (Note: apriori PageRank computation can be found in eu.wdaqua.pagerank; contextual disambiguation 'Sub-PageRank' can be found in alu.linking.disambiguation.scorers.subpagerank)
alu.linking.disambiguation.scorers
Scorers and scorer-related classes used for disambiguation
alu.linking.disambiguation.scorers.embedhelp
Helper Classes used by various embeddings-related scorers
alu.linking.disambiguation.scorers.hillclimbing
Choosing / 'Picking' Schemes related to Hill-Climbing
alu.linking.disambiguation.scorers.pairwise
Choosing / 'Picking' Schemes related to pairwise strategies
alu.linking.disambiguation.scorers.subpagerank
Choosing / 'Picking' Schemes related to contextual (aka. not apriori) versions of PageRank
alu.linking.executable
All kinds of classes that can be executed through our Pipeline instance
alu.linking.executable.preprocessing
Preprocessing-related executable classes
alu.linking.executable.preprocessing.loader
Executable classes related to loading (precomputed) data (e.g. PageRank, Mention possibilities, ...)
alu.linking.executable.preprocessing.nounphrases
Executable classes related to nounphrases and their extraction
alu.linking.executable.preprocessing.setup
Executable preprocessing classes related to project setup (prior to proper linking pipeline execution)
alu.linking.executable.preprocessing.setup.surfaceform
Executable preprocessing classes related to surface form
alu.linking.executable.preprocessing.setup.surfaceform.processing
Executable classes related to surface forms' processing for proper use
alu.linking.executable.preprocessing.setup.surfaceform.processing.url
Executable classes related to URL-based surface forms' processing
alu.linking.executable.preprocessing.setup.surfaceform.query
Executable classes related to surface forms' acquisition through query executions (e.g. to a Jena or Virtuoso-loaded KG)
alu.linking.executable.preprocessing.util
Executable classes mostly boiling down to utility classes for other executables or their related classes
alu.linking.launcher
Various entry points to the code - either for some preprocessing steps, different kinds of disambiguation alternatives (e.g. via API, stdin, etc.)
alu.linking.mentiondetection
Mention detection related classes (aka. detecting mentions from plaintext)
alu.linking.mentiondetection.exact
Classes related to exact mention detection
alu.linking.mentiondetection.fuzzy
Classes related to fuzzy mention detection
alu.linking.postprocessing
Classes related to post processing of files
alu.linking.preprocessing
Classes related to preprocessing (non-executables, but potential dependendencies for executable classes)
alu.linking.structure
Mostly global interfaces or bean-type classes relating to code as well as execution structure
alu.linking.utils
Contains largely singleton or static utility classes potentially used by a multitude of classes
de.dwslab.petar
RDF2Vec-related classes (including modifications)
eu.wdaqua
PageRank implementation-related classes
org.aksw.gerbil
GERBIL evaluation framework-related classes (mostly used by alu.linking.api)

agnos's People

Contributors

kmdn avatar phdkris avatar

Watchers

James Cloos avatar Michael Färber avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.