asknowqa / krantikariqa Goto Github PK

View Code? Open in Web Editor NEW

57.0 7.0 18.0 33.45 MB

An InformationGain based Question Answering over knowledge Graph system.

Python 35.25% Shell 0.15% Jupyter Notebook 64.60%

question-answering research-project qa qa-over-kg

krantikariqa's Introduction

KrantikariQA

An Information Gain based Question Answering system over knowledge graph systems.

chmod +x parallel_data_creation.sh
download glove42B and save it in resource folder
mkdir logs
./parallel_data_creation.sh
python data_creation_step1.py
python reduce_data_creation_step2.py
CUDA_VISIBLE_DEVICES=3 python corechain.py -model slotptr -device cuda -dataset lcquad -pointwise False

Download glove

wget http://nlp.stanford.edu/data/glove.42B.300d.zip save it to resource folder unzip it

Use Anaconda installation (still need to test it)

conda env create -f environment.yml

Setup redis server (this setup is not necessary. Its used for caching)

For installation https://redis.io/topics/quickstart

Setup dbpedia and add the url in utils/dbpedia_interface.py

Setup SPARQL parsing server

@TODO: add code here 
Install nodejs (node, nodejs)
> nodejs app.js

Setup embedding server

 python ei_server.py (Keep this always on)
 This will need bottle installed (pip install bottle)

Check for running verison of DBPedia, Redis (if caching), SPARQL Parsing server, Embedding interface

Setup Qelos-utils

https://github.com/lukovnikov/qelos-util.git change into qelos-util dir and python setup.py build/develop/ cp qelos ../

Install few more things

A potential bug is that he glove file datatype would be <U32

A rdftype_lookup.json can be created using the keys of relation.pickle (data/data/common)

import numpy as np
mat = np.load('resources/vectors_gl.npy')
mat = mat.astype(np.float64)
np.save('resources/vectors_gl.npy',mat)

#### TODO
change embedding in configs to 300d

Once the dataset is prepared

To check if all the files are in correct palce run the following command

python file_location_check.py

Once the data is at appropriate place run the following command.

CUDA_VISIBLE_DEVICES=3 python corechain.py -model slotptr -device cuda -dataset lcquad -pointwise False

krantikariqa's People

Contributors

Stargazers

Watchers

Forkers

leezqcst hailiang-wang baylee001 shubhampachori12110095 zhhongzhi xcgfth colinsongf muncca aorogat lianglili rk19016 trinh-hoang-hiep khubaibahmed-dev rebaccamin ksjpswaroop

krantikariqa's Issues

preProcessing is rather conservative.

Only 2904 questions were processed in our run on 20th November. Everything henceforth works on this, and nuffin else. FIX THAT!

Sorry, I meet a error when run the code.

Hi, Sorry to trouble you. I find a error:
AttributeError: module 'datasetPreparation.entity_subgraph' has no attribute 'CreateSubgraph'

Would you like to help me?

yawei

Value Normalization in network file.

Does dot product in the network file at line 236,237 need to be normalized with the value of the vector?

Bugs

Only 2904 questions parsed (check why)
False paths include weird/metadata predicates (debate whether to remove them or not)
Y labels are 0/1 as of now. (find partially correct paths and rate them as such)
Pick up False labels with partially correct paths too (as opposed to purely random)

Hello,
Thanks for sharing your work. I am trying to set up the system for benchmarking purposes, however, I cannot run the SPARQL parsing server. I cannot find the file/lib used in your system. Any guides will be appreciated.
Thanks

I am not able to train my CNN due to this error and it automatically gets terminated

Lemmatization

Lemmatize everything while converting it to IDs (nlutils).

Rank all paths (topK * topK = K^2) using model instead of using word2vec to filter in hop 2

Redis server connect:Connection Refused

Started Redis server and still the connection gets refused when running the file krantikari.py

Getting started issue. (Missing attribute 'CreateSubgraph')

Hello,

I am encountering the same problem mentioned in issue #16 but it is not clear what the solution is.
This is happening at step 5 in the Readme.md, python data_creation_step1.py.

Note that the Python environment krantikari is built according to the provided environment.yml but there is no file data_creation_step1.py. Therefore, running (with sysargs)

(krantikari) user@host:/KrantikariQA$python data_creator_step1.py 0 -1 lcquad

results in the following output

Traceback (most recent call last):
  File "data_creator_step1.py", line 160, in <module>
    _predicate_blacklist=pb, _relation_file={}, return_data=False, _qald=False)
  File "data_creator_step1.py", line 71, in run
    cd_node = cd.CreateDataNode(_predicate_blacklist=_predicate_blacklist, _relation_file=_relation_file, _qald=_qald)
  File "/app/KrantikariQA/datasetPreparation/create_dataset.py", line 27, in __init__
    self.create_subgraph = es.CreateSubgraph(self.dbp, self.predicate_blacklist, self.relation_file, qald=_qald)
AttributeError: module 'datasetPreparation.entity_subgraph' has no attribute 'CreateSubgraph'

Hopefully this description provides all the details requested in this comment

I would also point out that there is no CreateSubgraph class in entity_subgraph.py, although it is called in both datasetPreparation/create_dataset.py and server.py which both do

from datasetPreparation import entity_subgraph as es

and each will attempt the same pattern:

self.create_subgraph = es.CreateSubgraph(self.dbp, self.predicate_blacklist, self.relation_file, qald=_qald)

and

subgraph_maker = es.CreateSubgraph(dbp, predicate_blacklist, {}, qald=False)

, respectively.

Any advice?

nlutils error: underscores not handled, tokenization messed up

utils/natural_language_utilities.py

Fix tensorutils trim

So that it discounts all but the trailing zeros.

Template 151 not covered by the preProcessing script

Handle ask query with variables

We don't handle ask queries whose answers depends upon whether this query will fetch any answer; only handle the ones which enquire whether THIS triple exists.

Eg. of queries we don't handle yet:

ASK WHERE {
        res:James_Bond dbo:spouse ?uri . 
}

When running, the error: CUDA error out of memory

Hi, sorry to trouble you again.
When I run : CUDA_VISIBLE_DEVICES=1 python corechain.py -model slotptr -device cuda -dataset lcquad -pointwise True

The error line: loss.backward()
My GPU memory : 10G.

Thank you for your help.

the maximum i_batch is 87

Sorry, trouble you, again.
in corechain.py: every epoch, I find the maximum i_batch is 87.
Are there eror on my local ?

thank you.

request for readme for reproduction of paper "Learning to Rank Query Graphs for Complex Question Answering over Knowledge Graphs"

Very interesting work and thanks for sharing the codes.
I wanna reproduce the experiments of paper "Learning to Rank Query Graphs for Complex Question Answering over Knowledge Graphs" , and wonder would you please provide guidance for reproduction.
Thank you again and waiting for your reply.