allenai / aristo-mini Goto Github PK

Aristo mini is a light-weight question answering system that can quickly evaluate Aristo science questions with an evaluation web server and the provided baseline solvers.

License: Apache License 2.0

Python 85.66% HTML 13.23% CSS 1.11%

aristo-mini's Introduction

Aristo mini

Overview
Quick-start guide
Component overview
Terminology
Solvers
The evaluation UI
Question sets
Feedback
History

Overview

Aristo mini is a light-weight question answering system that can quickly evaluate Aristo science questions with an evaluation web server and the provided baseline solvers. You can also extend the provided solvers with your own implementations to try out new approaches and compare results.

Quick-start guide

To experiment you'll need python 3.6. We recommend you create a dedicated virtual environment for aristo-mini and its dependencies. Then follow these steps.

Clone this repo:

git clone [email protected]:allenai/aristo-mini.git
cd aristo-mini

Install the requirements:

cd aristo-mini
pip install -r requirements.txt

Add the project to your PYTHONPATH

export PYTHONPATH=$PYTHONPATH:`pwd`

Run the random solver in one terminal window:
```
python aristomini/solvers/randomguesser.py
```
Run the evaluation web UI in another terminal window:
```
python aristomini/evalui/evalui.py
```
Try the UI in your browser at http://localhost:9000/

Component overview

Included are these components:

Simple solvers: Simple example solvers with JSON APIs that can answer multiple choice questions.
Simple Evaluation system: A web UI to a simple evaluation process that pairs questions with a solver to produce a score.
Question sets: A subset of Aristo's science questions are included for convenience.

Terminology

Consider a question that might be represented on an exam like this:

What is the color of the sky?

(A) blue
(B) green
(C) red
(D) black

Parts of this question are named like this:

Question stem: The non-choices part of the question. Example: What is the color of the sky?
Answer key: The correct answer's choice label. Example: A
Choice: One of the possible answers, consisting of a choice label (e.g., A) and choice text (e.g., blue).

These are modeled as NamedTuples in aristomini/common/models.py.

Solvers

Available solvers

Several solvers are included in this distribution of Aristo mini. You can run one solver at a time for the Evaluation UI to use.

Random solver

This solver answers questions randomly. It illustrates the question-answer interface for a solver.

As above, you can start it with

python aristomini/solvers/randomguesser.py

Then you can go to http://localhost:8000/solver-info to confirm that it is running.

To answer a question you can POST to /answer. To try it on the command line:

Make a JSON file with the question, structured like this:

% cat question.json
{
   "stem" : "What color is the sky?",
   "choices" : [
      { "label" : "A", "text" : "red" },
      { "label" : "B", "text" : "green" },
      { "label" : "C", "text" : "blue" }
   ]
}

Submit the request with curl:

% curl -H "Content-Type: application/json" --data @question.json http://localhost:8000/answer

Look at the response:

{
   "multipleChoiceAnswer" : {
      "choiceConfidences" : [
         {
            "choice" : { "text" : "red", "label" : "A" },
            "confidence" : 0.398084282084622
         },
         {
            "choice" : { "text" : "green", "label" : "B" },
            "confidence" : 0.984916549460303
         },
         {
            "confidence" : 0.13567292440745,
            "choice" : { "text" : "blue", "label" : "C" }
         }
      ]
   },
   "solverInfo" : "RandomGuesser"
}

Text search solver

See aristomini/solvers/textsearch.md for setup and running instructions.

Word vector similarity solver (in Python)

Use the scripts/train_word2vec_model.py script to train a Word2Vec model from a text file of sentences (one per line). For instance, you could use the same sentences as the text search solver

Then start the solver with the path to the word2vec model:

python python/aristomini/solvers/wordvectorsimilarity.py /path/to/word2vec/model

Writing your own solver

The Easy Way

Modify aristomini/solvers/mysolver.py. It has two TODOs for the parts you need to update.

The Hard Way

Your solver has to be an HTTP server that responds to the GET /solver-info and POST /answer APIs. The POST /answer API has to consume a JSON-formatted question document and must produce a JSON-formatted response document with the answer. You can start reading at aristomini/common/solver.py (which is extended by the provided solvers) to understand the input and output document structures.

Since a solver is just a HTTP server, you can write it in any language you like. You should follow the existing solvers for the input and output JSON formats.

The evaluation UI

Once started (see above) you can go to http://localhost:9000/ and click around.

The UI is hard-coded to connect to a solver on localhost:8000. If you started a solver as above, it will be automatically used. You can restart solvers (on localhost:8000) while the evaluation UI remains running.

Question sets

Several question sets are provided in the questions/ directory.

These question sets are written in the JSONL format, each line corresponding to an instance of MultipleChoiceQuestion.

To try other question sets in this format, add them to the above questions directory and restart the evaluation UI.

AI2 provides more questions at http://allenai.org/data.html

Feedback

Please tell us what you think!

If you have a question or suggestion for a change, take look at existing issues or file a new issue.
If you'd like to propose a change to this code, please submit a pull request.

History

November, 2016: Initial public release, version 1.
February, 2018: Delete all Scala code.
March, 2018: Update README.

aristo-mini's People

Contributors

Stargazers

Watchers

aristo-mini's Issues

It would help to have a working example

Why not set up a repository with a working example using word2vec? I find that most documentation is badly written; so I prefer to just look through working code.

UnsupportedOperation: not writable

Hi,

I meet a issue when I run the project. The detail is the following message. Does anyone know hoe to solve the issue? Thank you so much.

Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 6.4.0 -- An enhanced Interactive Python.

runfile('H:/789/aristo-mini-master/aristo-mini-master/aristomini/solvers/randomguesser.py', wdir='H:/789/aristo-mini-master/aristo-mini-master/aristomini/solvers')
Traceback (most recent call last):
  File "<ipython-input-1-1b29279ad79e>", line 1, in <module>
    runfile('H:/789/aristo-mini-master/aristo-mini-master/aristomini/solvers/randomguesser.py', wdir='H:/789/aristo-mini-master/aristo-mini-master/aristomini/solvers')
  File "C:\Users\qbao775\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)
  File "C:\Users\qbao775\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)
  File "H:/789/aristo-mini-master/aristo-mini-master/aristomini/solvers/randomguesser.py", line 27, in <module>
    solver.run()
  File "H:\789\aristo-mini-master\aristo-mini-master\aristomini\common\solver.py", line 41, in run
    app.run(host=host, port=port)
  File "C:\Users\qbao775\AppData\Local\Continuum\anaconda3\lib\site-packages\flask\app.py", line 938, in run
    cli.show_server_banner(self.env, self.debug, self.name, False)
  File "C:\Users\qbao775\AppData\Local\Continuum\anaconda3\lib\site-packages\flask\cli.py", line 629, in show_server_banner
    click.echo(message)
  File "C:\Users\qbao775\AppData\Local\Continuum\anaconda3\lib\site-packages\click\utils.py", line 259, in echo
    file.write(message)

function annotations are giving syntax errors

The function annotations (->) are giving syntax errors whenever i run the randomguesser server. Are they necessary? Can we strip the code of function annotations?

Enhancement: document how to change network port & concurrency factor via YAML

The README suggests changing the ports that solvers run on / the number of questions sent to a solver in parallel by modifying Evaluator.scala and Evaluation.scala. It'd be better to show how to tweak these values in per-project YAML config. The base project could then be set up to pull these values from app config rather than have magic numbers floating around.

Text search solver: better initial corpus

The text search solver setup instructions point to an old text which isn't very relevant to the provided science questions:

https://github.com/allenai/aristo-mini/blob/master/solvers/textsearch/src/main/scala/org/allenai/aristomini/solver/textsearch/README.md

It'd be nice if the corpus used was a bit more relevant.

missing insert_text_to_elasticsearch file

Filed on behalf of a user:

I had forked the aristo-mini repository a few weeks back and the read-me said that to run the Word2Vec and/or the LSTM models there needed to a script titled insert_text_to_elasticsearch.py which will train the elastic search model. However, I could find no such file and consequently copied the script from an older commit. After piping results from the provided training file through the script using zcat my shell simply froze. I am not sure whether there should be an updated version of this script or I need to use a different command.

Text search solver: easier ElasticSearch startup

The text search solver setup instructions require manually populating an Elastic Search index:

https://github.com/allenai/aristo-mini/blob/master/solvers/textsearch/src/main/scala/org/allenai/aristomini/solver/textsearch/README.md

This could be simplified:

Pre-process a corpus and distribute bulk commands directly
Pre-populate the search corpus and distribute the data/ directory contents

Codec error

When I run the model on my windows system I get an error saying,

File "C:\Anaconda3\envs\aristo-mini\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1412: character maps to <undefined>

I'm guessing its something to with the encoding of the Json files .
could someone help in resolve this error, also which encoding do the Json files use ?

Code: unit tests

Write more unit tests to cover known usage.

missing concepts.txt file

Filed on behalf of a user:

Additionally in acme.py under solvers there is a call to a file call concepts.txt, since my LSTM isn't working I don't know if this is an intermediate file or it should exist. But in the repository as it is it doesn't existed. I hope my questions were explanatory enough.

migrate to elasticsearch 5.x client

Migrating to the Elasticsearch 5.x server and client means dealing with breaking changes:

https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_50_java_api_changes.html

I've given it a try in branch https://github.com/allenai/aristo-mini/tree/elasticsearch-5.x with change 888458f.

That compiles but the textsearch solver fails at runtime:

~/code/aristo-mini/solvers/textsearch/target/universal/stage % cd `pwd`; bin/solver-textsearch
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
Exception in thread "main" java.lang.NoClassDefFoundError: io/netty/channel/RecvByteBufAllocator
        at org.elasticsearch.transport.Netty4Plugin.getSettings(Netty4Plugin.java:39)
        at org.elasticsearch.plugins.PluginsService.lambda$getPluginSettings$0(PluginsService.java:85)
        at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at org.elasticsearch.plugins.PluginsService.getPluginSettings(PluginsService.java:85)
        at org.elasticsearch.client.transport.TransportClient.buildTemplate(TransportClient.java:115)
        at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:228)
        at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:69)
        at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:65)
        at org.allenai.aristomini.solver.textsearch.TextSearchSolver$.makeEsClient(TextSearchSolver.scala:87)
        at org.allenai.aristomini.solver.textsearch.TextSearchSolver$.<init>(TextSearchSolver.scala:36)
        at org.allenai.aristomini.solver.textsearch.TextSearchSolver$.<clinit>(TextSearchSolver.scala)
        at org.allenai.aristomini.solver.textsearch.TextSearchSolverServer$.<init>(TextSearchSolverServer.scala:6)
        at org.allenai.aristomini.solver.textsearch.TextSearchSolverServer$.<clinit>(TextSearchSolverServer.scala)
        at org.allenai.aristomini.solver.textsearch.TextSearchSolverServer.main(TextSearchSolverServer.scala)
Caused by: java.lang.ClassNotFoundException: io.netty.channel.RecvByteBufAllocator
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 20 more

Someone else noticed this as well:

http://stackoverflow.com/questions/40282043/elasticsearch-5-java-client-does-not-work-with-groovy-grape

Texts search solver: tweak Elasticsearch usage

Two ideas came up earlier about improving the results from Elasticsearch, because the defaults are not great for us:

ES shard/replica config
Use Snowball analyzer
Remove use of DFS_QUERY_THEN_FETCH

For reference, discussion about this is here:

https://github.com/allenai/theo/pull/16#discussion_r82080360

Project(..., settings = ...) in build.sbt is considered bad practice

See this comment. You're not currently using any auto-plugins in your build, but if you start to head that way then the Project(..., settings = ...) constructor can wreak unexpected havoc.

The easy fix is to just change all Project(..., settings = ...) calls to Project(...).settings(...). The latter appends settings to whatever might have already been added.

Enhancement: allow bin/* scripts to be called from anywhere, not just target/universal/stage

After running sbt stage from the root directory, I ignored the README instructions and tried:

./evalui/target/universal/stage/bin/evalui

And got the error:

Exception in thread "main" java.io.FileNotFoundException: File conf/eval-server.yaml not found

Following the README worked:

cd evalui/target/universal/stage
bin/evalui

But requiring the cd seems like a usability problem.

It looks like you've hard-coded the paths to conf/ locations in build.sbt. It should be possible to modify the paths to be relative to the running bin/ script itself, not the directory the script is being called from, so the scripts can be called from anywhere.

Python version in docs

The instructions for Python solvers say to use pip and python. I have two versions of Python installed (/usr/local/bin/python2.7 and /usr/local/bin/python3.5), and pip uses 2.7:

% head -1 /usr/local/bin/pip
#!/usr/local/opt/python/bin/python2.7

As currently written, the instructions didn't work because pip was installing into an old version of Pythong. To make it work, I had to do this:

Instead of pip ..., I used python3.5 /usr/local/bin/pip ...
Instead of python ..., I used python3.5 ...

I don't know how common this is, but if there's a good chance others will run into this situation, then the instructions should be written to explicitly specify Python 3.5.

Import error

When running "python aristomini/solvers/randomguesser.py I got the folllowing error:
from aristomini.common.solver import SolverBase
ModuleNotFoundError: No module named 'aristomini'

It seems "aristomini" is a module that includes another module which is "common"
Also if I run python and then import aristomini, this works.

D:\aristo-mini>python aristomini\solvers\randomguesser.py
Traceback (most recent call last):
File "aristomini\solvers\randomguesser.py", line 5, in
from aristomini.common.solver import SolverBase
ModuleNotFoundError: No module named 'aristomini'