Coder Social home page Coder Social logo

yayuanzi8 / sparqa Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nju-websoft/sparqa

0.0 1.0 0.0 44.07 MB

SPARQA: Skeleton-based Semantic Parsing for Complex Questions over Knowledge Bases, AAAI 2020

License: MIT License

Python 100.00%

sparqa's Introduction

SPARQA: question answering over knowledge bases

Codes for paper: "SPARQA: Skeleton-based Semantic Parsing for Complex Questions over Knowledge Bases" (AAAI-2020) detail. If you meet any questions, please email to him (ywsun at smail.nju.edu.cn).

Project Structure:

FileDescription
codecodes
skeletonskeleton bank of complex questions
slidesslides and poster

Requirements

Configuration

  • Root of dataset: default D:/dataset. Note that you can edit it in common/globals_args.py.

Common Resources

  • Eight Resources: GloVe (glove.6B.300d), Stanford CoreNLP server, SUTime Java library, BERT pre-trained Models, and four preprocessing files(stopwords.txt, ordinal_fengli.tsv, unimportantphrase, and unimportantwords). pan. The extraction code is kbqa. unzip and save in the root.
  • Two version Freebase: latest version and 2013 version (we provide the 2013 version db file. The extraction code is kbqa). Next, download a virtuoso server and load the KBs. The file is helpful, if you meet questions.

Specific CWQ 1.1 Resources

  • CWQ 1.1 dataset, Skeleton Parsing models, Word-level scorer model, Sentence-level scorer model. pan. The extraction code is kbqa. unzip and save in the root.
  • Entity-related Lexicons and schema-related lexicons. pan. The extraction code is kbqa. unzip and save in the root.

Specific GraphQuestions Resources

  • GraphQuestions dataset, Skeleton Parsing models, Word-level scorer model. pan. The extraction code is kbqa. unzip and save in the root.
  • Entity-related Lexicons and schema-related lexicons. pan. The extraction code is kbqa. unzip and save in the default root/kb_freebase_en_2013.

Run SPARQA Pipeline

The pipeline has two steps for answering questions:

  • (1) KB-indenpendent graph-structured ungrounded query generation.
  • (2) KB-dependent graph-structure grounded query generation and ranking.

See running/freebase/pipeline_cwq.py if run CWQ 1.1. See running/freebase/pipeline_grapqh.py if run GraphQuestions. Below, an example on GraphQuestions.

Specific-dataset Configuration

  • Set datset in the common/globals_args.py: q_mode=graphq. (note that q_mode=cwq if CWQ 1.1)
  • Set skeleton parsing in the common/globals_args.py: parser_mode=head, which means skeleton parsing. (note that parser_mode=dep, which means dependency parsing).
  • Replace the freebase_pyodbc_info and freebase_sparql_html_info in the common/globals_args.py with your local address. (note that 2013 version is for GraphQuestions, and latest version is for CWQ 1.1).

KB-indenpendent query generation

  • Run KB-indenpendent query generation. Setup variable module=1.0. The input: graph_questions_filepath. The output: structure_with_1_ungrounded_graphq_file. We provided one sample to help easily understand the complete structure. I can provide the structures of all questions if you need.

KB-dependent query generation

  • Generate variant generation. Set variable module=2.1. The input: structure_with_1_ungrounded_graphq_file. The output: structure_with_2_1_grounded_graph_file.
  • Ground candidate queries. Set module=2.2. The input: structure_with_2_1_grounded_graph_file. The output: structure_with_2_2_grounded_graph_folder.
  • Rank using word-level scorer. Set module=2.3_word_match. The input: structure_with_2_2_grounded_graph_folder.
  • Combine sentence-level scorer and word-level scorer. Set module=2.3_add_question_match. The input: structure_with_2_2_grounded_graph_folder.
  • Run evaluation. Set module=3_evaluation. The input: structure_with_2_2_grounded_graph_folder. The output: results.

Skeleton Parsing

  • SPARQA also provides a tool of parsing. The input is a question. The output is the skeleton of the question. (Now, it only supports English language. Later, it will support Chinese language)
  • You can use SPARQA's skeleton parsing to train yourself language. (It need replace the pre-trained models and annotated data with your language)

Multi-Strategy Scoring

  • SPARQA has provided a trained word-level scorer model and sentence-level scorer above pan.

Oracle Grounded Graph

  • Complex questions always involve multi-relations in knowledge base, which lead to search space exponent problem. We have try two ways: online and offline. The former is to generate candidate queries online. The former is very slow because of large degree vertices. The latter first retrieve oracle graphs (to reduce storage space, we adopt path format storage) and then generate candidate queries from oracle graphs. About oracle graph, please see this paper.
  • We provide the code of offline ways, oracle graphs of CWQ 1.1 and oracle graphs of GraphQuestions. The extraction codes are kbqa.
  • We also can provide the code of online ways.

Compare with Baselines

  • GraphQuestions: PARA4QA, SCANNER, UDEPLAMBDA.
  • CWQ 1.1: PullNet, SPLITQA, and MHQA-GRN. Note that PullNet used annotated topic entities of questions in its KB only setting. SPARQA, an end-to-end method, do not use annotated topic entities. Thus, it is not comparable.

Citation

@inproceedings{SunZ0Q20,
  author    = {Yawei Sun and Lingling Zhang and Gong Cheng and Yuzhong Qu},
  title     = {{SPARQA:} Skeleton-Based Semantic Parsing for Complex Questions over Knowledge Bases},
  booktitle = {The Thirty-Fourth {AAAI} Conference on Artificial Intelligence, {AAAI} 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, {IAAI} 2020, The Tenth {AAAI} Symposium on Educational Advances in Artificial Intelligence, {EAAI} 2020, New York, NY, USA, February 7-12, 2020},
  pages     = {8952--8959},
  publisher = {{AAAI} Press},
  year      = {2020},
  url       = {https://aaai.org/ojs/index.php/AAAI/article/view/6426},
}

Contacts

If you have any difficulty or questions in running codes, reproducing experimental results, and skeleton parsing, please email to him (ywsun at smail.nju.edu.cn).

sparqa's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.