Coder Social home page Coder Social logo

nlp-question-answer-system's Introduction

NLP-Question-Answer-System

####Plan for 11-411 Final Project

Week 12 Plans

  • 100 Generated Questions Review

Week 11 Plans

  • Reminder: Final Video (8 mins)
  • Will be over Spring Carnival

Week 10 Plans

  • Final Project due April 14, midnight

Week 9 Plans

  • Reminder: Prepare for Dry Run
  • Dry Run due April 7, before class

Week 8 Plans

  • Put together basic working system
  • Complete simple tests on raw data (html files and our own questions)

Week 7 Plans

  • Ariel: Parse raw text for question generation
  • Vijay: Question generation template filling
  • Caitlin: Yes/No question answering
  • Emily: Factoid question answering
  • Reminder: Progress Report 2 Due (video)

Week 4 Plans

  • Continue to make progress
  • Prepare for Progress Report 2 (video)

Week 2 Plans

  • Vijay: Question Templates
  • Caitlin/Emily: Pronoun Resolution
  • Ariel: Entity-Factoid Database (consider issues with synonyms)
  • Reminder: Submit Progress Report 1
  • March 4: Instructor Meeting

Week 1 Plans

  • Vijay: Question generation using templating
  • Caitlin: Research useful capabilities of NLTK/Stanford, write code to parse text in basic ways
  • Ariel: Entity-factoid database
  • Emily: Answer generation

Parsing the Text

We will use NLTK to apply part of speech labeling and the Stanford parser for entity relationship modeling. Using an external package to take care of the details of implementation will allow us to focus on tweaking our algorithms on a higher level. We will use these tools to parse text before both distinguishing between the tasks of asking and answering. Because pronouns are fundamentally ambiguous, we will consider using a probabilistic model for anaphora resolution. Then, we can build an offline database of entity-factoid pairs that we can query for answering.

Asking

Since we have already tagged the text when parsing, we can then identify the candidate subjects of each question in an article and build a collection of question templates. We will then extract meta-information (such as the “Categories” section) from the Wikipedia HTML structure to topically generate questions for sections of text. Given a new article, we will consider the attributes of each subject in the text and apply the most probable question template based on all critical words in each sentence.

Answering

Figuring out what type of question (yes/no, location, date, etc) is being asked will be useful for determining which relationships between words we should be considering. Parsing sentences into phrases and then deciding the functionality of the phrase will be useful for answering questions based on types. For example, a prepositional phrase that describes location will be useful for answering a location question while maintaining proper grammar. If information or relationships in the article are successfully extracted, then this information will be delivered as the answer. Otherwise we will retrieve the most likely sentence, treating the keywords of the question as a vector that we are trying to match, and extract the most salient section of this retrieved sentence as the answer. We will likely use term frequency to rank sentences within a document. Both the asking and answering modules will share the structured data extracted from parsing text and use wrappers over useful NLP algorithms from NLTK/other libraries. Otherwise the two components’ system designs will be independent.

Evaluation

We will be automatically evaluating each question generated for grammar and syntax, to ensure fluency. We will also evaluate answers for surface-level factual accuracy and adherence to the information need of the corresponding question, in addition to grammar and syntax. We will use the quality of candidate answers as a ranking criterion for the questions we generate.

Team Coordination

Our team will be using Git for version control and to share code/data. Because our group is divided among people with a background in programming vs. experience with linguistics, we will be dividing tasks accordingly. One of our team expectations by the first progress report will be to have a full “skeleton” of the functional system, and we will then assign specific coding tasks accordingly. We also have a standing weekly meeting (Thursdays) to delineate tasks and manage our progress.

Team: Tune a Fish

Emily Bram
Caitlin Lohman
Ariel Rao
Vijay Viswanathan

nlp-question-answer-system's People

Contributors

caitlohman avatar emilybram avatar raoariel avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.