Coder Social home page Coder Social logo

divyawadehra / w210-capstone-qg Goto Github PK

View Code? Open in Web Editor NEW

This project forked from juliabuffinton/w210-capstone-qg

0.0 1.0 0.0 291.4 MB

Repository for Question Generation project, W210 Capstone for UCB MIDS.

Jupyter Notebook 95.82% Python 2.86% HTML 1.14% Shell 0.11% CSS 0.07%

w210-capstone-qg's Introduction

AutoQ: Improving reading comprehension through automatic question generation

Repository for Question Generation project, W210 Capstone for UCB MIDS. Julia Buffinton, Saurav Datta, Joanna Huang, Kathryn Plath

About Our Project

AutoQ is the first free web app with automatically generated reading comprehension questions targeted to English language learners. Our product utilizes state of the art machine learning and natural language processing techniques to create a vast and topical collection of multiple-choice questions that mimic English-language exam formats. We currently offer over 100,000 practice questions on over 10,000 Wikipedia articles.

Our site is designed to help improve reading comprehension for English-language learners or for anyone looking to study up on a new topic. Starting with the articles from Wikipedia, we have developed algorithms to produce reading comprehension questions designed to mimic the style of questions seen on the TOEFL exam. Improving reading comprehension is achieved from practice on new material.

Do it Yourself

We employed several techniques to achieve this result. The below instructions walk through our approach. Input and output folders can be updated from within the scripts.

1. Preprocess Wikipedia articles

Raw text to Wikipedia articles was obtained from Wikimedia dumps. For the rest of the pipeline, it shoudl be formatted the same was as SQuAD datasets.

From this directory, run the preprocessing script to break up the Wikipedia articles into paragraphs and save them into the wikipedia_squad folder.

sh preprocess.sh

2. Select relevant sentences to query

Using the paragraphs from the Wikipedia articles, we identify the most “important” sentences to ask questions about. In general, the first and last sentences of each paragraphs often introduce key information or summarize information in the paragraph, so we examine those. We assume that words that appear most frequently are most important (and likely good indicators of the main topic), and therefore the most “important” sentences contain these frequent words.

From this directory, run the sentence selection script to select important sentences and save them, labeled with their location (in labeled_sentences) and unlabled for question generation (in unlabeled_sentences).

sh sentence_selection.sh

3. Question Generation

We then feed these important sentences into our Question Generation model, an attention-based bidirectional LSTM, inspired by Du et al.'s 2017 paper, Learning to Ask: Neural Question Generation for Reading Comprehension. We implement it using Torch on top of the Open Neural Machine Translation framework. Our approach is adapted from GenerationQ.

From this directory, run the question generation script to generate questions for the important sentences, save them in the questions folder, and add them back to the SQuAD-formatted Wikipedia articles (done with a call to add_questions.py, overwriting the files in wikipedia_squad).

sh question_generation.sh

3. Question Answering

To make these questions useful as a tool for reading comprehension, we must also generate their answers, so users can compare their results. We generate these also with a bidirectional LSTM, which we chose for its relative simplictiy and competitive performance on our validation set. Our implementation was modelled after part of Facebook’s 2017 paper Reading Wikipedia to Answer Open-Domain Questions, and implemented via PyTorch.

From this directory, run the question answering script to generate answers to our questions, save them in the answers folder, and add them back to the SQuAD-formatted Wikipedia articles (done with a call to add_answers.py, saving the files in wikipedia_squad_w_answers).

sh question_answering.sh

w210-capstone-qg's People

Contributors

joanna408 avatar saurav-datta avatar juliabuffinton avatar kplath99 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.