NLP-Question-Answer-System

####Plan for 11-411 Final Project

Week 12 Plans

100 Generated Questions Review

Week 11 Plans

Reminder: Final Video (8 mins)
Will be over Spring Carnival

Week 10 Plans

Final Project due April 14, midnight

Week 9 Plans

Reminder: Prepare for Dry Run
Dry Run due April 7, before class

Week 8 Plans

Put together basic working system
Complete simple tests on raw data (html files and our own questions)

Week 7 Plans

Ariel: Parse raw text for question generation
Vijay: Question generation template filling
Caitlin: Yes/No question answering
Emily: Factoid question answering
Reminder: Progress Report 2 Due (video)

Week 4 Plans

Continue to make progress
Prepare for Progress Report 2 (video)

Week 2 Plans

Vijay: Question Templates
Caitlin/Emily: Pronoun Resolution
Ariel: Entity-Factoid Database (consider issues with synonyms)
Reminder: Submit Progress Report 1
March 4: Instructor Meeting

Week 1 Plans

Vijay: Question generation using templating
Caitlin: Research useful capabilities of NLTK/Stanford, write code to parse text in basic ways
Ariel: Entity-factoid database
Emily: Answer generation

Parsing the Text

We will use NLTK to apply part of speech labeling and the Stanford parser for entity relationship modeling. Using an external package to take care of the details of implementation will allow us to focus on tweaking our algorithms on a higher level. We will use these tools to parse text before both distinguishing between the tasks of asking and answering. Because pronouns are fundamentally ambiguous, we will consider using a probabilistic model for anaphora resolution. Then, we can build an offline database of entity-factoid pairs that we can query for answering.

Asking

Since we have already tagged the text when parsing, we can then identify the candidate subjects of each question in an article and build a collection of question templates. We will then extract meta-information (such as the “Categories” section) from the Wikipedia HTML structure to topically generate questions for sections of text. Given a new article, we will consider the attributes of each subject in the text and apply the most probable question template based on all critical words in each sentence.

Answering

Figuring out what type of question (yes/no, location, date, etc) is being asked will be useful for determining which relationships between words we should be considering. Parsing sentences into phrases and then deciding the functionality of the phrase will be useful for answering questions based on types. For example, a prepositional phrase that describes location will be useful for answering a location question while maintaining proper grammar. If information or relationships in the article are successfully extracted, then this information will be delivered as the answer. Otherwise we will retrieve the most likely sentence, treating the keywords of the question as a vector that we are trying to match, and extract the most salient section of this retrieved sentence as the answer. We will likely use term frequency to rank sentences within a document. Both the asking and answering modules will share the structured data extracted from parsing text and use wrappers over useful NLP algorithms from NLTK/other libraries. Otherwise the two components’ system designs will be independent.

Evaluation

We will be automatically evaluating each question generated for grammar and syntax, to ensure fluency. We will also evaluate answers for surface-level factual accuracy and adherence to the information need of the corresponding question, in addition to grammar and syntax. We will use the quality of candidate answers as a ranking criterion for the questions we generate.

Team Coordination

Our team will be using Git for version control and to share code/data. Because our group is divided among people with a background in programming vs. experience with linguistics, we will be dividing tasks accordingly. One of our team expectations by the first progress report will be to have a full “skeleton” of the functional system, and we will then assign specific coding tasks accordingly. We also have a standing weekly meeting (Thursdays) to delineate tasks and manage our progress.

Team: Tune a Fish

Emily Bram
Caitlin Lohman
Ariel Rao
Vijay Viswanathan

divyawadehra / nlp-question-answer-system Goto Github PK