Sub-Project Name | License Info | JAR/Source/Dependency Location |
---|---|---|
RBBNPE (NPComplete) | Copyright GPLv3, Laurenz Vorderwoelbecke & Michael Färber |
lib/npcomplete.jar; bmw.annprocessor.executable.NPExtractionManager; bmw.annprocessor.executable.NPExtractionManagerTest |
PageRankRDF | MIT License | eu.wdaqua |
java-LSH | MIT License | LSHMinhash |
RDF2Vec | / | de.dwslab.petar |
(Maven) Setup of virt-jena/jdbc drivers (dependency required for Virtuoso execution) on a local repository (due to unavailability on public repositories)
- Install Jena's Virtuoso Driver
- https://stackoverflow.com/questions/41137342/is-there-any-usable-dependency-for-virtuoso-jena-driver
- Maven
- https://maven.apache.org/download.cgi
- Maven instructions
- https://www.mkyong.com/maven/how-to-install-maven-in-windows/
- For the Virtuoso maven jar generation
- Do this (add quotes compared to the solution proposed, as it otherwise complains with some errors)
- Explained why here: https://stackoverflow.com/questions/16348459/error-the-goal-you-specified-requires-a-project-to-execute-but-there-is-no-pom
- Commands
- mvn install:install-file -q "-Dfile=./virt_jena3.jar" "-DgroupId=com.openlink.virtuoso" "-DartifactId=virt_jena3" "-Dversion=3.0" "-Dpackaging=jar" "-DgeneratePom=true"
- mvn install:install-file -q "-Dfile=./virtjdbc4.jar" "-DgroupId=com.openlink.virtuoso" "-DartifactId=virtjdbc4" "-Dversion=4.0" "-Dpackaging=jar" "-DgeneratePom=true"
- 1) Importing KG, setting up mention detection & pagerank, then computing embeddings
- 1. Import KG into Jena (local) or Virtuoso (possible remote connection through JDBC driver) by executing LauncherSetupTDB, specifying the wanted EnumModelType (aka. KG) and input location in order to set up the TDB for Jena usage. If the KG is accessed via the Virtuoso endpoint/JDBC driver (and is already loaded within it), this step is not necessary.
- 2. Execute LauncherExecuteQueries, specifying the wanted EnumModelType (aka. KG) in order to execute all wanted surface form, helping surface form etc. queries on the KG - required for mention detection and for deciding which embeddings are required.
- 3. Execute LauncherSetup, specifying the wanted EnumModelType (aka. KG) - this will execute the mention detection setup, as well as compute apriori PageRank values for the KG file located as defined in FilePaths.FILE_PAGERANK. Note that a local version of the KG needs to be present (even if using Virtuoso) in order to compute PageRank on it.
- 4. Compute Graph Walks (Java) with wanted arguments in LauncherWalkGenerator and then executing it, outputting walks to ./[KG]/resources/data/walks.txt.
- 5. Specify location of output graph walks in ./sentencesPaths.txt (can be split among multiple files)
- 6. Specify hyperparameters and compute Embeddings (Python) by executing ./scripts/trainModel.py
- 7. Place/Move embeddings appropriately into the KG's file tree structure as defined in FilePaths.FILE_EMBEDDINGS_GRAPH_WALK_ENTITY_EMBEDDINGS (or change the value to match the embeddings output location).
- 2) (Optional) Mention Detection Tuning (LSH)
- 0. Execute LauncherMentionDetectionTuning, defining a sample input as well as potentially different bins and bands dimensions in order to tune LSH arguments for execution times (not inclusiveness/exclusiveness/quality of results). Then adapt the arguments LSH_BANDS and LSH_BUCKETS in Numbers.java appropriately.
- 3) Done. Ready to apply entity linking! (E.g. by executing LauncherContinuousMentionDetector or using it as a NIF API type of annotator with calls to GERBILAnnotator)
- *.debug
- Classes used for debugging / analysis
- *.deprecated
- Classes that are deprecated for the current version, but might make a comeback depending on direction and progress of research.
- alu.linking
- Main Project Folder
- alu.linking.api
- Classes related to API calls (currently only NIF API for GERBIL)
- alu.linking.candidategeneration
- Classes related to candidate generation
- alu.linking.config
- Configuration-related classes and packages
- alu.linking.config.constants
- Configuration-related runtime constants (e.g. file locations, knowledge graphs, server connections, ...)
- alu.linking.config.kg
- Knowledge graph-related constants (e.g. supported KGs and related entity queries)
- alu.linking.disambiguation
- Disambiguation-related classes and packages
- alu.linking.disambiguation.hops
- Classes related to graph-hopping through KGs
- alu.linking.disambiguation.hops.graph
- In-memory graph related classes
- alu.linking.disambiguation.hops.pathbuilding
- Classes used for crawling paths within KGs (for 'hopping')
- alu.linking.disambiguation.pagerank
- PageRank-related classes (Note: apriori PageRank computation can be found in eu.wdaqua.pagerank; contextual disambiguation 'Sub-PageRank' can be found in alu.linking.disambiguation.scorers.subpagerank)
- alu.linking.disambiguation.scorers
- Scorers and scorer-related classes used for disambiguation
- alu.linking.disambiguation.scorers.embedhelp
- Helper Classes used by various embeddings-related scorers
- alu.linking.disambiguation.scorers.hillclimbing
- Choosing / 'Picking' Schemes related to Hill-Climbing
- alu.linking.disambiguation.scorers.pairwise
- Choosing / 'Picking' Schemes related to pairwise strategies
- alu.linking.disambiguation.scorers.subpagerank
- Choosing / 'Picking' Schemes related to contextual (aka. not apriori) versions of PageRank
- alu.linking.executable
- All kinds of classes that can be executed through our Pipeline instance
- alu.linking.executable.preprocessing
- Preprocessing-related executable classes
- alu.linking.executable.preprocessing.loader
- Executable classes related to loading (precomputed) data (e.g. PageRank, Mention possibilities, ...)
- alu.linking.executable.preprocessing.nounphrases
- Executable classes related to nounphrases and their extraction
- alu.linking.executable.preprocessing.setup
- Executable preprocessing classes related to project setup (prior to proper linking pipeline execution)
- alu.linking.executable.preprocessing.setup.surfaceform
- Executable preprocessing classes related to surface form
- alu.linking.executable.preprocessing.setup.surfaceform.processing
- Executable classes related to surface forms' processing for proper use
- alu.linking.executable.preprocessing.setup.surfaceform.processing.url
- Executable classes related to URL-based surface forms' processing
- alu.linking.executable.preprocessing.setup.surfaceform.query
- Executable classes related to surface forms' acquisition through query executions (e.g. to a Jena or Virtuoso-loaded KG)
- alu.linking.executable.preprocessing.util
- Executable classes mostly boiling down to utility classes for other executables or their related classes
- alu.linking.launcher
- Various entry points to the code - either for some preprocessing steps, different kinds of disambiguation alternatives (e.g. via API, stdin, etc.)
- alu.linking.mentiondetection
- Mention detection related classes (aka. detecting mentions from plaintext)
- alu.linking.mentiondetection.exact
- Classes related to exact mention detection
- alu.linking.mentiondetection.fuzzy
- Classes related to fuzzy mention detection
- alu.linking.postprocessing
- Classes related to post processing of files
- alu.linking.preprocessing
- Classes related to preprocessing (non-executables, but potential dependendencies for executable classes)
- alu.linking.structure
- Mostly global interfaces or bean-type classes relating to code as well as execution structure
- alu.linking.utils
- Contains largely singleton or static utility classes potentially used by a multitude of classes
- de.dwslab.petar
- RDF2Vec-related classes (including modifications)
- eu.wdaqua
- PageRank implementation-related classes
- org.aksw.gerbil
- GERBIL evaluation framework-related classes (mostly used by alu.linking.api)