shijx12 / kqapro_baselines Goto Github PK

Pytorch implementation of baseline models of KQA Pro, a large-scale dataset of complex question answering over knowledge base.

Home Page: http://thukeg.gitee.io/kqa-pro/

License: MIT License

Python 100.00%

kbqa sparql dataset bart seq2seq

kqapro_baselines's Introduction

KQA Pro Baselines

KQA Pro is a large-scale dataset of complex question answering over knowledge base, which provides strong supervision of SPARQL and program for each question. Here is its homepage website. This dataset is licensed under a Creative Commons Attribution-ShareAlike 4.0 International.

This repo implements several baselines for the dataset:

Blind GRU. It predicts the answer in terms of only the input question, ignoring the knowledge base. We use it to measure the dataset bias.
KVMemNN (Key-Value Memory Networks)
RGCN (Relational Graph Convolutional Networks)
SRN (Stepwise Relational Networks)
RNN seq2seq SPARQL parser
RNN seq2seq program parser
BART seq2seq SPARQL parser
BART seq2seq program parser

Instructions of how to run these models are described in their README files. Before trying them, you need to first download the dataset and unzip it into the folder ./dataset. The file tree should be like

.
+-- dataset
|   +-- kb.json
|   +-- train.json
|   +-- val.json
|   +-- test.json
+-- GRU
|   +-- preprocess.py
|   +-- train.py
|   +-- ...
+-- KVMemNN
+-- RGCN
...

kqapro_baselines's People

Contributors

Stargazers

Watchers

kqapro_baselines's Issues

virtuoso 服务无法启动问题解决记录

根据KQA pro【1】的教程安装好以后，可以正常使用。

第二天发现无法启动而且提示：There is no configuration file virtuoso.ini

查资料【2】中建议通过在 /etc/rc.conf 中增加virtuoso.ini的路径

例如我的地址如下：

virtuoso_config="/usr/local/lib/virtuoso/db/virtuoso.ini"

然后完美解决

【1】https://github.com/shijx12/KQAPro_Baselines/tree/master/Bart_SPARQL)
【2】https://forums.freebsd.org/threads/virtuoso-service-fails-to-start.48881/

好奇test data为什么只提供choices，不提供answer呢

如题

KeyError: 'rewrite'

In line 38 of Bart_Program/preprocess.py, there is "question = item['rewrite']", but the sample doesn't have the key 'rewrite'. I think it should be 'question'?

The detail of Bart program model

In the train.py, the model is not valid during training. So I copy the valid part from predict.py.
But during training at start, the valid acc drop from 0.007 to 0.004 and does not increase during training.
I use bart-base as pretrained model.
Can you provide more details for the experiment?

Wrong results using checkpoints for Bart Program baseline

Hi (@shijx12 @ShulinCao @teacherpeterpan @Flitternie @stellarkey),
thank you for your work.

I have some issue using the code base, can you kindly help me?

I don't know if the training works, but when I try to restore the checkpoint the predict.txt is full of "no".

My requirements.txt is the following:

transformers==4.0.0
torch==1.6.0
sentencepiece==0.1.96
nltk==3.7
numpy==1.19.2

The steps that I perform (from the KQAPro_Baseline/ folder [the root one]) are the following:

python3 -m Bart_Program.preprocess --input_dir data/ --output_dir preproc_data/ --model_name_or_path KQAPro_ckpt/program_ckpt/ 
cp data/kb.json preproc_data/
python3 -m Bart_Program.predict --input_dir preproc_data/ --save_dir log_folder --ckpt preproc_data/

Thank you in advance.

Kind regards,
Andrea

lack question type annotations

it seems that the dataset lacks the 9 different types and question hops annotations, could you please release them?

thx

Seems DEMO is not working now.

I found that KQA Pro DEMO doesn't seem to work right now.

Converting KoPL to SparQL and vice-versa

Is there a way to directly convert KoPL to SparQL. I tried searching the repo for it, but to no avail. Please let me know if there is script that can perform this conversion deterministically (KoPL <-> SparQL).

What do <b> and <c> mean in Program?

I assume that <b> splits functions, and <c> splits arguments of functions, is it true?

Thanks for your reply.

The EmbedKGQA is not included in this project

the result of EmbedKGQA baseline is also compared in the published paper and on the leaderboard.
But is not provided in code.

How are special tokens handled?

I believe that the entity id, predicate id and keywords in SPARQLs are special tokens for pre-trained models.
May I know that how these special tokens are handled?
Thanks.

bart large cant be trained cuz out of memory

Hi, trying to revive the result of bart_program, it seems they used bart-large model. When i load barge large and trained, it failed cuz out of memory problem, tried to reduce the size of batch but no luck, could you please help, thank you

forward() got an unexpected keyword argument 'lm_labels'


2021-03-04 13:59:21,045 INFO     Checking...
2021-03-04 13:59:21,045 INFO     ===================Dev==================
Traceback (most recent call last):
  File "/home/home1/yhshu/.conda/envs/pytorch16/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/home1/yhshu/.conda/envs/pytorch16/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/home1/yhshu/workspace/FusionQA/KQAPro_Baselines-master/Bart_SPARQL/train.py", line 200, in <module>
    main()
  File "/home/home1/yhshu/workspace/FusionQA/KQAPro_Baselines-master/Bart_SPARQL/train.py", line 196, in main
    train(args)
  File "/home/home1/yhshu/workspace/FusionQA/KQAPro_Baselines-master/Bart_SPARQL/train.py", line 112, in train
    outputs = model(**inputs)
  File "/home/home1/yhshu/.conda/envs/pytorch16/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'lm_labels'

I think KQA Pro is a very good work.
However, I ran into some problems while running the experiments.
The repository does not describe how to import BART models, and I directly used facebook/bart-base as the model name and encountered the problem above.

in KQAPro's 'kb.json' there is 'subClassOf' instead of 'instanceOf' for concepts

In the code: https://github.com/shijx12/KQAPro_Baselines/blob/master/utils/load_kb.py#L15

knowledge json format:
    'concepts':
    {
        'id':
        {
            'name': '',
            'instanceOf': ['<concept_id>'],
        }
    },
......

While in KQAPro's dataset:

{
    "concepts": {
        "Q7270": {
            "name": "republic",
            "subclassOf": [
                "Q7174"
            ]
        },

in the paper: (KQA Pro)

Using subClassOf might be more reasonable comparing to instanceOf?

BART_SPARQL文档问题

Predict answers of the test set. It will produce a file named predict.txt in the --save_dir, storing the predictions of test questions in order.
python -m SPARQL.predict --input_dir <dir/of/processed/files> --ckpt <dir/of/checkpoint> --save_dir <dir/of/log/files>

应该改为：
Predict answers of the test set. It will produce a file named predict.txt in the --save_dir, storing the predictions of test questions in order.
python -m Bart_SPARQL.predict --input_dir <dir/of/processed/files> --ckpt <dir/of/checkpoint> --save_dir <dir/of/log/files>

并且 <dir/of/processed/files> 中需要包含处理后+处理前的数据。

About special predicates

Hi there,
special tokens such as

    PRED_INSTANCE = 'pred:instance_of'
    PRED_NAME = 'pred:name'

    PRED_VALUE = 'pred:value'  # link packed value node to its literal value
    PRED_UNIT = 'pred:unit'  # link packed value node to its unit

    PRED_YEAR = 'pred:year'  # link packed value node to its year value, which is an integer
    PRED_DATE = 'pred:date'  # link packed value node to its date value, which is a date

    PRED_FACT_H = 'pred:fact_h'  # link qualifier node to its head
    PRED_FACT_R = 'pred:fact_r'
    PRED_FACT_T = 'pred:fact_t'

It seems that these tokens are not supported by official Wikidata query service . Is there any document explains about how these tokens work?

缺少pgrk.txt文件

KQAPro_Baselines-master/SRN/knowledge_graph.py, line 29, in load_pgrk_score
with open(os.path.join(self.args.input_dir, 'pgrk.txt')) as f:

SRN/knowledge_graph.py中需要文件pgrk.txt，请问这个文件是什么，数据处理中没有生成这个文件

Is it convenient to provide a trained model?

Hi,
I follow the guidelines of SPARQL/README to the training steps.
But my training process is very slow, for I don't have a GPU.

2021-08-10 14:57:44,425 INFO     input_dir:../output/preprocess
2021-08-10 14:57:44,425 INFO     save_dir:../output/train
2021-08-10 14:57:44,425 INFO     lr:0.001
2021-08-10 14:57:44,425 INFO     weight_decay:1e-05
2021-08-10 14:57:44,425 INFO     num_epoch:100
2021-08-10 14:57:44,425 INFO     batch_size:64
2021-08-10 14:57:44,425 INFO     seed:666
2021-08-10 14:57:44,426 INFO     dim_word:300
2021-08-10 14:57:44,426 INFO     dim_hidden:1024
2021-08-10 14:57:44,426 INFO     max_dec_len:100
2021-08-10 14:57:44,426 INFO     Create train_loader and val_loader.........
#vocab of word/sparql/answer: 48557/45693/81629
2021-08-10 14:57:47,783 INFO     Create model.........
2021-08-10 14:57:48,553 INFO     SPARQLParser(
  (word_embeddings): Embedding(48557, 300)
  (word_dropout): Dropout(p=0.3, inplace=False)
  (question_encoder): GRU(
    (encoder): GRU(300, 1024, num_layers=2, batch_first=True, dropout=0.2)
  )
  (sparql_embeddings): Embedding(45693, 300)
  (decoder): GRU(
    (encoder): GRU(300, 1024, num_layers=2, batch_first=True, dropout=0.2)
  )
  (sparql_classifier): Sequential(
    (0): Linear(in_features=1024, out_features=1024, bias=True)
    (1): ReLU()
    (2): Linear(in_features=1024, out_features=45693, bias=True)
  )
  (att_lin): Linear(in_features=1024, out_features=1024, bias=True)
)
2021-08-10 14:57:48,554 INFO     Start training........
2021-08-10 15:02:59,072 INFO     progress: 0.009  loss: 7.8691 (7.4725)  lr: 0.001000
2021-08-10 15:08:42,599 INFO     progress: 0.019  loss: 1.1711 (4.3432)  lr: 0.001000
2021-08-10 15:17:58,772 INFO     progress: 0.028  loss: 0.8847 (3.1818)  lr: 0.001000
2021-08-10 15:28:47,679 INFO     progress: 0.038  loss: 0.7479 (2.5666)  lr: 0.001000
2021-08-10 15:38:20,867 INFO     progress: 0.047  loss: 0.6800 (2.1879)  lr: 0.001000

I wonder if it's convenient to provide a trained model? I want to study how the program works.

no module named V2_Bart_Program

predicate.py in BART Program has line as follow

from V2_Bart_Program.executor_rule import RuleExecutor

however，this is no file named V2_Bart_Program

do you mean the excutor_rule.py in Bart_Program？or forget to update a new version of Bart_program?

thx

问答数据集生成过程代码

作者您好，非常开心能够看到知识多跳问答数据集领域有这么严谨细致的工作。
我现在也在制作领域的问答数据集，不知道您能不能开放生成答案模块的那部分代码呢？或者用什么工具模型能够产生一些问题？

shijx12 / kqapro_baselines Goto Github PK

kqapro_baselines's Introduction

KQA Pro Baselines

kqapro_baselines's People

Contributors

Stargazers

Watchers

Forkers

kqapro_baselines's Issues

Recommend Projects

Recommend Topics

Recommend Org