liberai / nspm Goto Github PK
View Code? Open in Web Editor NEW🤖 Neural SPARQL Machines for Knowledge Graph Question Answering.
Home Page: http://aksw.org/Projects/NeuralSPARQLMachines
License: MIT License
🤖 Neural SPARQL Machines for Knowledge Graph Question Answering.
Home Page: http://aksw.org/Projects/NeuralSPARQLMachines
License: MIT License
During executing the demo training step in the README, I found these two statements every epoch, I suppose there may be some unneeded code in the train.py
Table trying to initialize from file ../data/monument_300_model/vocab.en is already initialized.
Table trying to initialize from file ../data/monument_300_model/vocab.en is already initialized.
Trying to solve it.
FailedPreconditionError (see above for traceback): HashTable has different value for same key. Key dbr_Terreiro_da_Luta has 127 and trying to add value 285
how to solve this problem
to specify the packages and their version required for the project
ex
enum34
numpy
tensorflow==1.2.0
Environment
tensorflow==1.14.0
Log
$ python build_vocab.py data/monument_300/data_300.en > data/monument_300/vocab.en
WARNING:tensorflow:From build_vocab.py:44: VocabularyProcessor.__init__ (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tensorflow/transform or tf.data.
WARNING:tensorflow:From /usr/local/lib/python3.7/site-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py:154: CategoricalVocabulary.__init__ (from tensorflow.contrib.learn.python.learn.preprocessing.categorical_vocabulary) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tensorflow/transform or tf.data.
WARNING:tensorflow:From /usr/local/lib/python3.7/site-packages/tensorflow/contrib/learn/python/learn/preprocessing/text.py:170: tokenizer (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tensorflow/transform or tf.data.
Since the support for Python2 is now being rescinded, it would be great if the README.md could indicate that code had been written in Python2 so any future developers could set up the appropriate development environment.
thanks for your excellent work in the interesting issue.
When i'm training with your given monument_300 data,I saw the output like this:
step 4100 lr 1 step-time 2.35s wps 2.36K ppl 64.10 gN 3.08 bleu 2.74, Sat Dec 8 14:28:50 2018
Can you tell me what does the ppl and gN mean? And why is the BLEU score so small?
Thank you very much.
wrong output of tensorflow version check in NSpM/nmt/nmt/utils/misc_utils.py
EnvironmentError: Tensorflow version must >= 1.2.1
changes to be made
from
if tf.__version__ < “1.2.1”:
raise EnvironmentError("Tensorflow version must >= 1.2.1”)
to
if tf.__version__ < "1.02.1":
raise EnvironmentError("Tensorflow version must >= 1.02.1")
https://github.com/AKSW/NSpM/blob/f33f60dd2b1f423cde079a249328cae2115fdb5f/generator.py#L136
what does it mean? it seems inaccurate comparison with list and integer, it will always return true!
Hey, I don't know if I am just misunderstanding the instructions, but the project kept giving me ValueError:"Can't load save_path when it is None." if I tried to precisely follow the instructions in the readme.
I believe it would work correctly with your readme instructions if you would change this part of ask.sh:
python -m nmt.nmt --vocab_prefix=../$1/vocab --model_dir=../$1_model --inference_input_file=./to_ask.txt --inference_output_file=./output.txt --out_dir=../$1_model --src=en --tgt=sparql | tail -n4
to this:
python -m nmt.nmt --vocab_prefix=../$1/vocab --model_dir=../$1 --inference_input_file=./to_ask.txt --inference_output_file=./output.txt --out_dir=../$1 --src=en --tgt=sparql | tail -n4
(that $1_model makes it repeat the word weirdly in the folder names)
Please, feel free to correct me if I am wrong, and thank you for your awesome paper.
When trying to install the dependencies with
pip install -r requirements.txt
several errors concerning conflicting dependencies are raised.
Hallo, ist es vielleicht möglich fertig trainierte Models zur Verfügung zu stellen? Ich wollte das Netz mit dem Datensatz https://figshare.com/articles/Question-NSpM_SPARQL_dataset_EN_/6118505 trainieren, jedoch dauert das auf meinem Rechner einfach zu lange.
I tried running the pipelines but because of some python version related issues I was getting errors in pipeline 1.
The solution which worked was using
from urllib.request import urlopen
and then urlopen(<url>)
instead of import urllib
and then using urllib.request.urlopen(<url>)
Make sure to use python3.7
to run the pipelines as @panchbhai1969 's code uses it.
It works becuase
The urllib and urllib2 modules from Python 2.x have been combined into the urllib module in Python 3
as mentioned here
Also, while setting up the project I realised it will be better to have a requirements.txt
file.
I would like to do it too as my initial contribution.
While running
python build_vocab.py data/monument_300/data_300.en > data/monument_300/vocab.en
The python interpreter gives out the following warning
WARNING:tensorflow:From build_vocab.py:43: init (from tensorflow.contrib.learn.python.learn.preprocessing.text) is deprecated and will be removed in a future version.
Can use tensorflow/transform or tf.data in place to keep up with the recent updates (as suggested by the python interpreter)
While splitting the data file into train, dev and test sets by running the following commands given in the README.md
cd data/monument_300/
python ../../split_in_train_dev_test.py --lines $NUMLINES --dataset data.sparql
I run into the following error
Traceback (most recent call last):
File "../../split_in_train_dev_test.py", line 42, in
with open(sparql_file) as original_sparql, open(en_file) as original_en:
IOError: [Errno 2] No such file or directory: 'data.sparql'
which can be solved by renaming the files in monument_300 (data_300.sparql and data_300.en to data.sparql and data.en)
It may seem trivial, but README.md files are the first point of information that a parson refers to when trying to understand a project, The repository's README.md is good, but i have found some error that need attention.
NUMLINES= $(echo awk '{ print $1}' | cat data/monument_300/data_300.sparql | wc -l)
There shouldn't be any space after =, the line should be :
NUMLINES=$(echo awk '{ print $1}' | cat data/monument_300/data_300.sparql | wc -l)
Other README.md related issues and suggestions brought up by users are #12 #14 #17
On running the ./ask.sh script I'm getting the following error:
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/NSpM/nmt/nmt/nmt.py", line 707, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/content/NSpM/nmt/nmt/nmt.py", line 700, in main
run_main(FLAGS, default_hparams, train_fn, inference_fn)
File "/content/NSpM/nmt/nmt/nmt.py", line 658, in run_main
save_hparams=(jobid == 0))
File "/content/NSpM/nmt/nmt/nmt.py", line 607, in create_or_load_hparams
hparams = extend_hparams(hparams)
File "/content/NSpM/nmt/nmt/nmt.py", line 493, in extend_hparams
unk=vocab_utils.UNK)
File "/content/NSpM/nmt/nmt/utils/vocab_utils.py", line 137, in check_vocab
raise ValueError("vocab_file '%s' does not exist." % vocab_file)
ValueError: vocab_file '../data/monument_300/vocab.en' does not exist.
# Job id 0
# Devices visible to TensorFlow: [_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 2340982298104704118), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 7897109989793672363)]
# Creating output directory ../data/monument_300_model ...
ANSWER IN SPARQL SEQUENCE:
cat: output.txt: No such file or directory
Can someone please help me with this?
File "nmt/model_helper.py", line 444, in compute_perplexity
perplexity = utils.safe_exp(total_loss / total_predict_count)
ZeroDivisionError: integer division or modulo by zero
The LC-QuAD data set has 5000 pairs, but I generated it through the lc-quad cvs file in the data path, and the result exceeded hundreds of thousands of LC-QuAD sentence pairs.Please can you help me generate accurate LC-QuAD data set
One of the last TensorFlow updates broke the ./ask.sh
script. Probably it has to be written from scratch.
Hi, a very small thing, but when running the example presented in the Readme.md, in the "Interpreter Module" section, the argument "--output" is not currently supported. Here is the fixed line of code.
Current:
python nspm/interpreter.py --input data/art_30 --output data/art_30 --query "yuncken freeman has architected in how many cities?"
New:
python nspm/interpreter.py --input data/art_30 --query "yuncken freeman has architected in how many cities?"
The version check in nmt/utils/misc_utils.py has a snippet of code for checking the version of TF installed on the local machine
def check_tensorflow_version():
if tf.__version__ < "1.2.1":
raise EnvironmentError("Tensorflow version must >= 1.2.1")
I have tested out the code using both the current version of TF as well as TF-nightly as the authors want you to do (mentioned on their README.md). The current version is 1.12 and it is obvious the check is failing because it's comparing the versions numerically.
I have created an issue on the main project of nmt as well, but since this would affect the working of this project as well, I am also opening an issue here.
unzip art_30.zip
Archive: art_30.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of art_30.zip or
art_30.zip.zip, and cannot find art_30.zip.ZIP, period.
Replace the ./nmt/
submodule with an internal library based on the NMT with attention tutorial compatible with TensorFlow 2.2.0rc4.
Hello together
First of all, thank you for sharing the code of this project.
I was able to train the model and make some predictions, but now I want to find the shortcomings of the model, so I want to analyze on which questions/queries the model performs well.
I found the "analyse.sh" script and the "filter_dataset.py".
Now I want to ask you what's the purpose of these files and how to use them.
Thank you for your time
Kind regards
Nicolas
espaço: a fronteira final. Estas são as viagens da nave estelar Enterprise. Em sua missão de cinco anos... para explorar novos mundos... para pesquisar novas formas de vida e novas civilizações... audaciosamente indo onde nenhum homem jamais esteve.
Hello!
Firstly thank you very much for your repository and research. This is a very interesting field. I am currently using your monument dataset as the training data in my master thesis.
I notice you uploaded a new dataset called movies_300.zip several days ago. I intended to try it in my experiments as well but I found that it has many duplicate lines in the training file (e.g. "how long is the longest movie" showed 227 times in 'train.en').
Could you explain what is the reason for that? Is it appropriate to use this dataset for training or this dataset is just made for other tasks?
Thank you and best regards
Xiaoyu
FailedPreconditionError (see above for traceback): HashTable has different value for same key. Key en has 3 and trying to add value 715
there are some isssues in the file data folder monument 300 zip file in this file build_vocab there are some library are not mentioned
import numpy as np
from tensorflow.contrib import learn
import sys
from importlib import reload
reload(sys)
#there is no neccessitiy of encoding as it is use in python 2.5 version so we can remove
sys.setdefaultencoding("utf-8")
x_text = list()
#there is no arguement in the script so we can change 1 to 0
with open(sys.argv[0]) as f:
for line in f:
#we will remove unicode
x_text.append(unicode(line[:-1]))
# x_text = ['This is a cat','This must be boy', 'This is a a dog']
max_document_length = max([len(x.split(" ")) for x in x_text])
## Create the vocabularyprocessor object, setting the max lengh of the documents.
vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)
## Transform the documents using the vocabulary.
x = np.array(list(vocab_processor.fit_transform(x_text)))
## Extract word:id mapping from the object.
vocab_dict = vocab_processor.vocabulary_._mapping
## Sort the vocabulary dictionary on the basis of values(id).
## Both statements perform same task.
#sorted_vocab = sorted(vocab_dict.items(), key=operator.itemgetter(1))
sorted_vocab = sorted(vocab_dict.items(), key = lambda x : x[1])
## Treat the id's as index into list and create a list of words in the ascending order of id's
## word with id i goes at index i of the list.
vocabulary = list(list(zip(*sorted_vocab))[0])
# print(vocabulary)
# print(x)
for v in vocabulary:
print(v)
Hi,
I recently utilized the technique that has been discussed in this project for transforming a natural language sentence into a SPARQL query. Based on this, I created an end to end question answering system as part of my final year project. The system works well for known resource names, however; for questions which contain out of vocabulary words (resource names/words not part of the training data), the system does not predict an accurate query.
In the Neural Machine Translation for Query Construction paper, it says that External pre-trained word embeddings help deal with vocabulary mismatch. I am not sure how this would be implemented, could you provide any insight? I am already finished with the project but I would still like to learn about this.
The project I created is available on GitHub and can be found here if you would like to see. There's also a deployed version of the system and can be found here.
Thanks for the help in advance.
Hello,
I really like your work on using seq2seq for creating SPARQL queries - just one question: Was there a specific reason not to include attention while training? As far as I understood the tensorflow NMT guilde, you would have to add something like --attention=scaled_luong
to the options in your train.sh. Did you evaluate whether it works better with/without attention?
Greetings!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.