Coder Social home page Coder Social logo

niladi / clit_backend Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kmdn/clit_backend

0.0 0.0 0.0 40.91 MB

Backend (aka. logic) code for the Combining Linking Techniques framework project.

License: Apache License 2.0

JavaScript 0.33% Python 1.02% Java 96.43% CSS 0.17% HTML 2.06%

clit_backend's Introduction

Agnos_mini

Agnos is a KG-agnostic entity linking framework, allowing for ease of extension and deployment. The ease of extension refers both to various knowledge graphs, as well as alternative methods for mention detection, candidate generation, entity disambiguation, as well as pruning.

Quick Start Guide

  1. Clone Repository
  2. Run install/InstallFiletree.java - as the name implies, it simply creates the file tree to make it easier to place required files.
  3. Add your desired RDF KG as a .NT file to the execution environment's directory under "./default/resources/data/kg.nt"
  4. Run install.LauncherInstallation.java - it (1) loads the placed KG into a RDF Store, (2) computes PageRank on it and (3) extracts mentions (by default with a SPARQL query querying for all rdfs:label elements).
  5. Setup complete!

Running Agnos

Post configuration, you may run Agnos by executing launcher.LauncherLinking
It takes a string input, applies exact case-insensitive mention detection on it, followed by candidate generation and default disambiguation behaviour.
Results are output to the console.
We also provide launcher.LauncherLinkingSample - an easily modifiable sample on how the annotation code process looks like.

Full Startup Guide

  1. Clone Repository
  2. Add wanted Knowledge Graph (KG) as a new enum item to structure/config/kg/EnumModelType
    e.g. MY_KNOWLEDGE_GRAPH("./my_kg/") - henceforth we will denote the chosen KG's root path as $KG$.
    Note: This will allow
    1. A user/developer to specify particular configurations for a specific KG (e.g. surface forms for mention detection, underlying caching structures etc.)
    2. Agnos to create the file tree as required by the system in the defined location, in this case under the execution's current directory in a $KG$ folder.
    3. KG isolation in order to avoid unexpected interactions on the user-side.
  3. Run install/BuildFiletree - as the name implies, it simply creates the file tree to make it easier to place files appropriately
  4. Load KG into an RDF Store by defining the location of your RDF-based KG within install.LauncherSetupTDB:KGpath and running it for your defined KG (in install.LauncherSetupTDB:KG). Note:
    1. If you define an input folder, all including files will be added to the Jena TDB.
    2. If you already have an existing Apache Jena(-compatible) RDF Store, simply put it into $KG$/resources/data/datasets/graph.dataset .
  5. Semi-OPTIONAL
    1. Put SPARQL Queries to be executed on loaded KG for surface form extraction into appropriate folders.
      If you already have a file containing surface forms and their related resources,
      please put it in $KG$/resources/data/links_surfaceForms.txt (the filepath may be changed in structure.config.FilePaths:FILE_ENTITY_SURFACEFORM_LINKING.
      The line-wise split delimiter may be defined under structure.config.Strings:ENTITY_SURFACE_FORM_LINKING_DELIM, where the resource is in first position and the defined literal in second.
    2. Define install.LauncherExecuteQueries:KG with the defined KG and run it. The program will extract appropriate surface forms from your defined KG, outputting them appropriately for the system to process.
  6. Setup complete! Simple mention detection and candidate generation may now be performed! As for disambiguation, depending on which scoring scheme one would like to use, a file containing PageRank scores or embeddings may have to be defined. For RDF PageRank computation, we provide code under install.PageRankComputer which may then be loaded by disambiguation algorithms using a PageRankLoader from the generated $KG$/resources/data/pagerank.nt file.


API


Code for NIF-format-based queries (api.NIFAPIAnnotator) as well as calls through JSON (api.JSONAPIAnnotator) are provided mainly for API-usage.
A very basic API front-end page may be downloaded from Agnos and used locally.

Mention Detection


Out-of-the-box Agnos provides users with 2 main mention detection mechanisms:
Former performs mention detection by checking whether a possible input is contained within a passed map instance.
Latter utilizes locality-sensitive hashing techniques (MinHash), allowing detection with a user-defined grade of fuzziness.
Please note that linking.mentiondetection.fuzzy.MentionDetectorLSH requires (surface form) structures to be computed prior to linking in order to allow for highly-scalable performance.

Custom Mention Detection


Mention detection standards are enforced through structure.interfaces.MentionDetector
It enforces easy-to-implement detection for the ease-of-processing of the text annotation pipeline.
As such, any custom mention detection technique should simply implement it in order to warrant compliance with other steps.
Therewith, e.g. consolidation of Agnos' mention detection through POS methods is relatively trivial.

Candidate Generation


Agnos mainly utilizes a single candidate generation mechanism: dictionary look-up.
It is implemented within linking.candidategeneration.CandidateGeneratorMap and can be used with a defined mapping.
Custom candidate generation may be performed through implementation of the structure.interfaces.CandidateGenerator interface.

Disambiguation


Agnos allows for simple extension of its disambiguation repertoire.
Among others, through use of its structure.interfaces.Scorer and structure.interfaces.PostScorer interfaces.
The difference between the two is that structure.interfaces.Scorer is assumed to be a so-called apriori scoring mechanism (meaning single candidate scores are independant of other candidates), whereas structure.interfaces.PostScorer instances attribute different scores to candidate entities, depending on other candidate entities they are detected with, therewith allowing for the notion of "context" to play a role.
An example of a structure.interfaces.Scorer instance would be our PageRankScorer.
For structure.interfaces.PostScorer instances, we provide linking.disambiguation.scorers.GraphWalkEmbeddingScorer and VicinityScorerDirectedSparseGraph, among others.
Defining which scoring mechanisms may be used for disambiguation is configurable through the defined linking.disambiguation.Disambiguator instance by calling the addScorer(...) and addPostScorer(...) methods, respectively.
How single scorers' scores are combined may be defined within their own implementation which is then applied through our consolidation mechanism linking.disambiguation.ScoreCombiner.

clit_backend's People

Contributors

phdkris avatar hashpad avatar kmdn avatar niladi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.