Coder Social home page Coder Social logo

cholan's Introduction

CHOLAN - Q105079136

CHOLAN : A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata (paper)

Wikidata

  • Dataset - We have extracted an EL dataset from the (T-Rex dataset). Please refer this (link) to download the dataset used in our experiments

Wikipedia

  • Dataset - AIDA-CoNLL, we used the dataset from the DCA paper. Please refer to this (repository).

Candidate Generation

  • FALCON 2.0 - The locally indexed KG items have been used. Please refer to this (repository) for the set up using the Wikidata dump.
  • (DCA) - A predefined candidate set has been used. (Wikipedia)

Setup

Requirements: Python 3.6 or 3.7, torch>=1.2.0

Running

python cholan.py ย 

Citation

@inproceedings{kannan-ravi-etal-2021-cholan,
    title = {CHOLAN: A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata},
    author = {Kannan Ravi, Manoj Prabhakar and Singh, Kuldeep and Mulang, Isaiah Onando and Shekarpour, Saeedeh and Hoffart, Johannes and Lehmann, Jens},
    booktitle = {Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume},
    year = {2021}
}

cholan's People

Contributors

manojprabhakar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cholan's Issues

What are the train/test splits of T-Rex used in the paper?

Hi. Thank you for the great work.

I was wondering if you could provide the train/test splits of T-Rex used in the paper?
In your README file, there is a link to download the file CHOLAN-EL-TREX.tsv. But there is no indication of which line in the file belongs to the train set or the test set.

Furthermore, I counted the number of data lines in that file. There were 1,089,661 data lines (except the header line). However, your paper mentions that "the dataset has 983,257 sentences". So was the file the same data you used in your paper?

Thank you.

Files missing

First of all thank you for making your code publicly available. We are working on an evaluation tool for entity linking systems and would love to include your system and reproduce your results. I did however not succeed in running your code and the provided instructions are a bit sparse.

More specificly, when calling python cholan.py as instructed here in the directory CHOLAN/Cholan_T-REx/End2End, I get the error message

Traceback (most recent call last):
  File "cholan.py", line 60, in <module>
    df_target = pd.read_csv(predict_data_dir + "ned_target_data.tsv", sep="\t", encoding='utf-8')
    ...
FileNotFoundError: [Errno 2] No such file or directory: '/data/prabhakar/CG/prediction_data/data_10000/ned_target_data.tsv'

When running python cholan.py in the directory Cholan_CoNLL_AIDA/End2End, I get the error message

Traceback (most recent call last):
  File "cholan.py", line 65, in <module>
    df_ned = pd.read_csv(predict_data_dir + "ned_data.tsv", sep='\t', encoding='utf-8', usecols=['sequence1', 'sequence2', 'label'])
    ...
FileNotFoundError: [Errno 2] No such file or directory: '/data/prabhakar/CG/WNED/msnbc/prediction_data/data_full/Zeroshot/ned_data.tsv'

Neither of these files are included in any of the linked data packages or the linked repositories.

Could you please provide the necessary data and provide some more instructions on how to use your code and reproduce your results?

Use CHOLAN for inference

I find your project very interesting but I find a strong drawback in the fact that you don't provide enough documentation on how to use your model for inference. Do you plan to add this information in future?
Looking forward to your reply :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.