Coder Social home page Coder Social logo

webaskb's Introduction

WebAsKB

This repository contains code for our paper The Web as a Knowledge-base for Answering Complex Questions. It can be used to train a neural model for answering complex questions, when the answer needs to be derived from multiple web snippets. This model was trained on the dataset ComplexWebQuestions, and the code is in PyTorch.

Setup

Setting up a virtual environment

  1. First, clone the repository:

    git clone https://github.com/alontalmor/webaskb.git
    
  2. Change your directory to where you cloned the files:

    cd webaskb
    
  3. Create a virtual environment with Python 3.6:

    virtualenv -p python3 venv
    
  4. Activate the virtual environment. You will need to activate the venv environment in each terminal in which you want to use WebAsKB.

    source venv/bin/activate (or source venv/bin/activate.csh)
    
  5. Install the required dependencies:

    pip3 install -r requirements.txt
    
  6. Install pytorch 0.3.1 from their website:

  7. Download external libraries:

    wget https://www.dropbox.com/s/k867s25qitdo8bc/Lib.zip
    unzip Lib.zip
    
  8. Download the data:

    wget https://www.dropbox.com/s/440f4096rkeo7xc/Data.zip
    unzip Data.zip
    
  9. Optional - install and run Stanford NLP server, to generate noisy supervision:

    wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
    cd stanford-corenlp-full-2016-10-31
    java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
    

Data

By default, we expect source data and preprocessed data to be stored in the "data" directory. The expected file locations can be changed by altering config.py. Note -- the dataset downloaded here contains only the question-answer pairs, the full dataset (including web snippets) can be downloaded from ComplexWebQuestions

Running

Now you can do any of the following:

  • Generate the noisy supervision data for training python -m webaskb_run.py gen_noisy_sup.
  • Run a pointer network to generate split points in the question python -m webaskb_run.py run_ptrnet.
  • Train the pointer network python -m webaskb_run.py train_ptrnet.
  • Create final predication and calculate p@1 scores python -m webaskb_run.py splitqa.
  • NEW! Run evaluation script for dev set python -m eval_script.py Data/complex_web_questions/ComplexWebQuestions_dev.json Data/predictions_dev.json.

Options: ‘—eval_set dev’ or ‘—eval_set test’ to choose between the development and test set.

Please note, Reading Comprehension answer predication data is provided in Data/RC_answer_cache. However the WebAnswer model was not included due to its complexity and reliance on the ability to query a search engine. You may replace the RC component with any other RC model to be used with the web-snippets in ComplexWebQuestions .

webaskb's People

Contributors

alontalmor avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.