Coder Social home page Coder Social logo

knowledge-based-wsd's Introduction

This code is an implementation of the Knowledge-based Word Sense Disambiguation (KWSD) Framework that exploits knowledge from different perspectives. In detail, it makes use of knowledge from Wikipedia and WordNet in a similarity-based WSD method and combines the method with a graph-based WSD method in some personalized settings such as top_3 senses filtration, customized sense graph construction and sense importance inheritance. State-of-the-art performance on several standard WSD datasets has proven the effectiveness of such a knowledge-exploitation framework.

Quick Evaluation

You can make use of the evaluation code provided in [1]. Before that, you should download the evaluation framework on the website EACL17. In this resource, you can use the scoror.java to evaluate the performance of our system on different datasets once at a time with the following code, using the files such as 'senseval2.raw.KWSD.key'.

java Scorer senseval2/senseval2.gold.key.txt senseval2.raw.KWSD.key

Also, you can use the evaluation code from UKB website, UKB[2]. Then use evaluate.sh to conduct evaluation after you move our result document 'raw.KWSD.key' into ukb-3.2/wsdeval/Keys/raw. The result is supposed to be as follows:

./evaluate.sh

Evaluating in Keys
/* raw.KWSD

ALL P= 68.0% R= 68.0% F1= 68.0%
semeval2007 P= 56.9% R= 56.9% F1= 56.9%
semeval2013 P= 68.4% R= 68.4% F1= 68.4%
semeval2015 P= 72.3% R= 72.3% F1= 72.3%
senseval2 P= 69.6% R= 69.6% F1= 69.6%
senseval3 P= 66.1% R= 66.1% F1= 66.1%

Reproduction of the system's result

  1. Quick: Using the embeddings (say "eLSA01" for senseval2) in the folder, you can run disambiguation.py to reproduce the exact reported results. The code itself can evalute the results and also output a file named 'raw.KWSD.key' which can still be evaluated with the above method.
  2. Slow: If you want to reproduce the results starting from the domain knowledge document retrieval, it might take a few hours. You also need to download a few documents including British National Corpus(BNC)[3] and Wikipedia dump for document retriever in [4]. The details will be given in the following section.

Reproduce results from scratch

  1. Prepare the 'document retriever' in [4]. Also, you need to download the TF-IDF model and Wikipedia database on that website for a quick implementation. We use the model for document name retrieval in docname_retrieval.py and use the database to retrieve the corresponding documents in doc_retrieve.py. query_access.py is to access the query for document retrieval.

  2. The retrieved documents are combined with BNC documents which are pre-processed with bnc_process.py.The combined document set is then used to learn word representations via lsa in gensim_lsa.py.

  3. Run disambiguation.py for disambiguation of each dataset with the following settings.

    python disambiguation.py -l True -d domain_doc_name

[1] Raganato A.; Camacho-Collados J.; and Navigli R. 2017. Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 99-110, Valencia, Spain: Association for Computational Linguistics.
[2] Agirre E.; de Lacalle O. L.; and Soroa A. 2018. The risk of sub-optimal use of Open Source NLP Software: UKB is inadvertently state-of-the-art in knowledge-based WSD. In Proceedings of Workshop for NLP Open Source Software, 29โ€“33, Melbourne, Australia: Association for Computational Linguistics.
[3] The British National Corpus, version 3 (BNC XML Edition). 2007. Distributed by Bodleian Libraries, University of Oxford, on behalf of the BNC Consortium. URL: http://www.natcorp.ox.ac.uk/
[4] Chen D.; Fisch A.; Weston J.; and Bordes A. 2017. Reading Wikipedia to Answer Open-Domain Questions, In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 1870-1879, Vancouver, Canada: Association for Computational Linguistics.# Knowledge-based-WSD

knowledge-based-wsd's People

Contributors

lwmlyy avatar

Stargazers

 avatar

Forkers

bartonlin

knowledge-based-wsd's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.