seominjoon / denspi Goto Github PK
View Code? Open in Web Editor NEWReal-Time Open-Domain Question Answering with Dense-Sparse Phrase Index (DenSPI)
Home Page: https://nlp.cs.washington.edu/denspi
License: Apache License 2.0
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index (DenSPI)
Home Page: https://nlp.cs.washington.edu/denspi
License: Apache License 2.0
Hello,
I am facing issue in sparse-first search and hybrid search:
Dense-First Search is working fine, but when I select the other options it gives the following error:
KeyError: "Unable to open object (object '3580546' doesn't exist)"
I have used the pretrained model and then created custom phrase index for "dev-v1.1"
ERROR:flask.app:Exception on /api [GET]
Traceback (most recent call last):
File "/root/anaconda3/envs/despi/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/root/anaconda3/envs/despi/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/root/anaconda3/envs/despi/lib/python3.6/site-packages/flask_cors/extension.py", line 161, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
File "/root/anaconda3/envs/despi/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/root/anaconda3/envs/despi/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/root/anaconda3/envs/despi/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/root/anaconda3/envs/despi/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functionsrule.endpoint
File "open/run_demo.py", line 128, in api
doc_top_k=5)
File "open/run_demo.py", line 94, in search
search_strategy=search_strategy, doc_top_k=5)
File "/root/denspi/open/mips_sparse.py", line 291, in search
doc_top_k=5)
File "/root/denspi/open/mips_sparse.py", line 218, in search_start
(doc_idxs, start_idxs), start_scores = self.search_sparse(query_start, doc_scores, doc_top_k)
File "/root/denspi/open/mips_sparse.py", line 168, in search_sparse
doc_group = self.get_doc_group(doc_idx)
File "/root/denspi/open/mips.py", line 121, in get_doc_group
if len(self.phrase_dumps) == 1:
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/root/anaconda3/envs/despi/lib/python3.6/site-packages/h5py/_hl/group.py", line 262, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object '3580546' doesn't exist)"
ERROR:tornado.access:500 GET /api?strat=sparse_first&query=pharmacy%20department%20and%20specialised%20areas%20 (127.0.0.1) 419.57ms
How could I reproduce the result for SQuAD 1.1(as shown in Table 1 in the paper) from scratch? I want to start to train a DENSPI (with scalar) from scratch. But I donnot know how to achieve my goal. The README file does not explain it.
Enable one command routine for indexing, prediction, and evaluation.
This will go into `open/run_index_pred_eval.py'.
Then the entire evaluation process will be largely three stages:
Hello there,
I am facing issue in setting up this code. Here is what I did:
I have downloaded Pretrained Model by running this command : "gsutil cp -r gs://denspi/v1-0/model ."
, and then created the Custom Phrase Index for "dev-v1.1" by running below command:
python run_piqa.py --do_dump --filter_threshold -2 --save_dir SAVE3_DIR/ --load_dir ROOT_DIR/model --metadata_dir ROOT_DIR/bert --data_dir ROOT_DIR/data/dev-v1.1 --predict_file 0:2 --output_dir ROOT_DIR/your_dump/phrase --dump_file 0-1.hdf5
After that I am serving the API and run the Demo by using following command :
python run_piqa.py --do_serve --load_dir ROOT_DIR/model --metadata_dir ROOT_DIR/bert --do_load --parallel --port 8000
python open/run_demo.py ROOT_DIR/dump ROOT_DIR/wikipedia --api_port 8000 --port 3000 --index_name 64_flat_SQ8 --sparse_type p
But the demo is not working properly. I have tested the demo by providing the questions from SQUAD 1.1 Dataset but it's not giving proper answers. Instead of expected answers, it looks like it gives random answers.
I am not able to understand why it is not providing accurate answers. Is there something which I have missed or doing wrong?
Is it compulsory to train the model on our own or the pre-trained model provided at "gs://denspi/v1-0/model ."
will work instead of training our own?
After all the installations (faiss
, drqa
, and the two requirements.txt
from this repo), run_index_pred_eval.py gives an error like below:
$ python open/run_index_pred_eval.py
/home/jinhyuk/github/kernel-sparse/dense
/data_nfs/camist002/data/dev-3.json
--para
--no_od
sampling from:
/home/jinhyuk/github/kernel-sparse/dense/phrase.hdf5
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 36.63it/s]
WARNING clustering 788 points to 256 centroids: please provide at least 9984 training points████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 341.87it/s]
Clustering 788 points in 481D to 256 clusters, redo 1 times, 10 iterations
Preprocessing in 0.00 s
INTEL MKL ERROR: /home/jinhyuk/miniconda3/envs/kesper/lib/python3.6/site-packages/faiss/../../../libmkl_avx2.so: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.
Following the recommendation from here, the conda install nomkl numpy scipy scikit-learn numexpr
command shows that there are some conflicts between the versions.
$ conda install nomkl numpy scipy scikit-learn numexpr
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: \
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Package libopenblas conflicts for:
scikit-learn -> numpy[version='>=1.11.3,<2.0a0'] -> libopenblas[version='>=0.3.2,<0.3.3.0a0']
Package blas conflicts for:
mkl_fft -> numpy-base[version='>=1.0.6,<2.0a0'] -> blas[version='|1.0',build=openblas]
blas
scikit-learn -> blas[version='||1.0',build='mkl|openblas|mkl|openblas']
nomkl -> blas=[build=openblas]
mkl_fft -> blas[version='|1.0',build=mkl]
numexpr -> blas[version='||1.0',build='mkl|openblas|mkl|openblas']
numpy -> blas[version='||1.0',build='mkl|openblas|mkl|openblas']
mkl_random -> blas[version='|1.0',build=mkl]
faiss-cpu=1.5.2 -> numpy[version='>=1.11'] -> numpy-base==1.16.0=py36hde5b4d6_1 -> blas[version='|1.0',build=openblas]
scipy -> blas[version='||1.0',build='mkl|openblas|mkl|openblas']
mkl_random -> numpy-base[version='>=1.0.2,<2.0a0'] -> blas[version='|1.0',build=openblas]
numpy-base -> blas[version='|*|1.0',build='mkl|openblas|mkl|openblas']
faiss-cpu=1.5.2 -> blas=[build=mkl]
faiss-cpu=1.5.2 -> numpy[version='>=1.11'] -> blas==1.0=mkl
Any idea how to resolve this?
Currently the neg training routine (--train_neg
in run_piqa.py
) is different from what is described in the paper.
In the paper, we use 'no answer' logit to train on negative examples so we just don't have a separate neg training routine. In the code, we have a neg training routine that instead attaches the neg example to each positive example (whose question embeddings are similar) after normal training.
In the code, several noise injections are also used.
In practice, the strategy in the current code is better than that in the paper (no answer logit). The paper will be updated soon and this issue will be resolved.
how to convert each float32 value to int8? where is the code about this?
Hi,
Thanks for your good work. I would like to reproduce the result for SQuAD 1.1 (as shown in Table 1 in the paper), but I am having some troubles.
First, I downloaded the Pretrained Model from "gs://denspi/v1-0/model" and then tried to eval on dev-v1.1 using: "python run_piqa.py --do_predict --output_dir tmp --do_load --load_dir model --predict_file dev-v1.1.json --do_eval --gt_file dev-v1.1.json --metadata_dir bert"
The predicted answer seems to be random span, resulting in a metric like: {"exact_match": 0.47303689687795647, "f1": 4.43806570152543}. 0.47% EM means something is totally wrong.
I wonder whether I did it correctly.
And if I want to train a model to reproduce the result by myself, since I cannot get the Pretrained Model work, is it enough to just run the first step in the training section (i.e. "python run_piqa.py --train_batch_size 12 --do_train --freeze_word_emb --save_dir $SAVE1_DIR")
Thanks and hope to get your advice
Ques:
Setting:
All the results are obtained using the commands mentioned in README.
Hi,
The demo link mentioned in Readme http://allgood.cs.washington.edu:15001/ is not working. I tried in Firefox and Chrome but it is not opening in both.
Whats the logic behind the versions of torch in the requirements?
hi, thanks for open-sourcing the project. great work!
I have questions on the choice of faiss index, i'd really appreciate if you find time to clarify:
Could you please share the detailed procedure of how you index wikipedia?
Is IVF1048576_HNSW32_SQ8
and search with nprobe=64
a precise summary of your choice?
I find in open/build_index.py
, there is a function named merge_indexes
. Did you build multiple sub-indexes then merge? or did not? because I feel the choice may have some effect on the performance.
just more specific Q1, the process of building index seems quite complicated in your code as follows. by default, it goes through
https://github.com/uwnlp/denspi/blob/f540b6a547f012823fc6c2bb10077df6bccc13a6/open/run_index.py#L121
https://github.com/uwnlp/denspi/blob/f540b6a547f012823fc6c2bb10077df6bccc13a6/open/run_index.py#L126-L131
https://github.com/uwnlp/denspi/blob/f540b6a547f012823fc6c2bb10077df6bccc13a6/open/run_index.py#L134-L137
then
https://github.com/uwnlp/denspi/blob/f540b6a547f012823fc6c2bb10077df6bccc13a6/open/run_index.py#L148
https://github.com/uwnlp/denspi/blob/f540b6a547f012823fc6c2bb10077df6bccc13a6/open/run_index.py#L164
Can the following two lines encode the same idea?
index = faiss.index_factory(d, "IVF1048576_HNSW32,SQ8")
index.train(data)
thanks!
Its in the installer here
but its commented out in the Dockerfile.
I haven't seen any other reference to it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.