dreaminvoker / gain Goto Github PK

View Code? Open in Web Editor NEW

141.0 141.0 30.0 2.06 MB

Source code for EMNLP 2020 paper: Double Graph Based Reasoning for Document-level Relation Extraction

License: MIT License

Python 94.13% Shell 5.87%

dgl document-level-relation-extraction graph-neural-networks natural-language-processing relation-extraction

gain's Introduction

Hi there 👋

This is Shuang Zeng [google scholar].

Currently, I am an application researcher in Data-Douyin-Comment at ByteDance.

I got Master Degree at Peking University under the supervision of Prof. Baobao Chang [google scholar].

My research interests now include Large Vision-Language Model, Retrieval-augmented Generation, Text2SQL, and Question Answering.

Profile Summary

gain's People

Contributors

Stargazers

Watchers

gain's Issues

AttributeError: 'bytearray' object has no attribute 'contiguous'

Hi, there
I just tried to train in ubuntu 18.04, and i got an error like the title.
Did anyone same too?
my pytorch and dgl version is same as the github. and run in cuda10.2
thx a lot.

使用BERT报错，但使用BiLSTM不报错

这是我使用cpu运行模型为bert时的报错。

这是使用gpu的报错，都是使用google网盘提供的数据集。
请问为啥会出现这样的错误呢

TypeError: expected Tensor as element 0 in argument 0, but got str

Hi! I'm trying to train the neural network, using the default values provided in the script run_GAIN_BERT.sh (I just changed the bert_path from ../PLM/bert-base-uncased to bert-base-uncased), but the training script is giving me an error after beginning. The error stack trace is as follows:

Traceback (most recent call last):
  File "train.py", line 233, in <module>
    train(opt)
  File "train.py", line 140, in train
    ht_pair_distance=d['ht_pair_distance']
  File "/home/saptakathaa/nre/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/saptakathaa/GAIN/code/models/GAIN.py", line 306, in forward
    encoder_outputs = torch.cat([encoder_outputs, self.entity_type_emb(params['entity_type'])], dim=-1)
TypeError: expected Tensor as element 0 in argument 0, but got str

Can you please help me in fixing it?

P.S. : The line numbers may vary by 2-3 lines as I have included some print statements to debug it myself.

Can you release the training log files of GAIN?

Hello,
Can you release the training log files of GAIN? Thanks!
Best regards

Pretrained models?

Are you able to post or release the pretrained models from the paper?

It would be helpful for those of us that don't have big enough GPUs to train 😆

Can you post the code for the comparison experiment, thanks.

email: [email protected]

文档的最大长度设置为512，在使用自己的数据集时长度超过会报错，应该修改哪些地方？

调大document_max_length=1500仍然报错：RuntimeError: The size of tensor a (512) must match the size of tensor b (1500) at non-singleton dimension 0

头、尾实体下标问题

在path_table中，存储头实体和尾实体时已经将index+1转换了；然后在h_t_pairs中，存储头实体和尾实体时也将index+1转换了，按道理说这两个是匹配的，都是index+1，但是在模型的forward（）中又将h_t_pairs中的头、尾实体+1，然后从path_table中取中间节点，请问这个地方是我理解错了么？还是代码在某个地方进行了转化我没注意到？

Which random seed are you using?

Hi, I'm trying to reproduce the results on BERT-base, but I can only get F1 / ignF1 = 0.5985 / 0.5752 for full model, and F1 / ignF1 = 0.6010 / 0.5796 for the nomention ablation.

I am using the given .sh so I suppose the reason is not the hyper-parameters. Could you provide the random seed you are using to reproduce the F1 = 0.6122 result? Thx!

关于bert结果的一些疑问

您好，想请教您一些关于bert的问题。
您在论文里面写的bert的初始学习率是1e-5
但是在代码的bert sh 里面是1e-3。
还有就是，您论文里面bert结果的那次训练中，bert直接fix了？还是也会更新？
谢谢！

Training cannot be done in GPU

Hi! I'm trying to train the neural network, using the default values provided in the script run_GAIN_GLOVE.sh. The code is running fine but I found that the training is being done using CPU. I tried to change the devices in the code to cuda to make it train using GPU. But it didn't work and started throwing errors "GPU device not available" although my system has GPUs.

Can you please help to execute the code using GPU?

philox_cuda_state for an unexpected CUDA generator used during capture. In regions captured by CUDA graphs, you may only use the default CUDA RNG generator on the device that's current when capture begins. If you need a non-default (user-supplied) generator, or a generator on another device, please file an issue.

有人知道这个报错是什么意思吗？如何解决？感谢！！

关于数据集

你好，我想问一下，如果换个数据集的话，应该怎么把其他数据集换成您的这个数据集的格式呢？您是怎么处理的呢？期待回复！万分感谢！

能详细说说Infer-F1指标是怎么被计算出来的吗？

我对Infer-F1这个指标比较感兴趣，但是论文里谈论得不多，我个人猜测是只计算能组成类似“三角关系”的三元组(e.g. (A, r1, B), (B, r2, C), (A, r3, C))的性能？不知道是否有理解错？能否开源指标的计算代码？

请问如何得到测试集的result.json？

需要自己选择合适的测试集的input_theta吗？默认input_theta=-1，会得到一个3.5GB的json格式文件

F1 和 IgnF1是如何计算的

运行代码后输出结果为IgnF1，请问F1该如何计算？

Error in Training

Hi! I'm trying to train the neural network, using the default values provided in the script run_GAIN_BERT.sh(I just changed the bert_path from ../PLM/bert-base-uncased to bert-base-uncased), but the training script is giving me an error after beginning. It seems to be an error with DGL The error is the following:

Traceback (most recent call last):
  File "train.py", line 231, in <module>
    train(opt)
  File "train.py", line 138, in train
    ht_pair_distance=d['ht_pair_distance']
  File "/Users/carlos.jimenez/PycharmProjects/GAIN/doc_processor/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/carlos.jimenez/PycharmProjects/GAIN/code/models/GAIN.py", line 348, in forward
    graphs = dgl.unbatch_hetero(graph_big)
  File "/Users/carlos.jimenez/PycharmProjects/GAIN/doc_processor/lib/python3.7/site-packages/dgl/batch.py", line 418, in unbatch_hetero
    return batch(*args, **kwargs)
  File "/Users/carlos.jimenez/PycharmProjects/GAIN/doc_processor/lib/python3.7/site-packages/dgl/batch.py", line 167, in batch
    if any(g.is_block for g in graphs):
  File "/Users/carlos.jimenez/PycharmProjects/GAIN/doc_processor/lib/python3.7/site-packages/dgl/batch.py", line 167, in <genexpr>
    if any(g.is_block for g in graphs):
  File "/Users/carlos.jimenez/PycharmProjects/GAIN/doc_processor/lib/python3.7/site-packages/dgl/heterograph.py", line 1968, in __getitem__
    raise DGLError('Invalid key "{}". Must be one of the edge types.'.format(orig_key))
dgl._ffi.base.DGLError: Invalid key "0". Must be one of the edge types.

Could you please help me finding what is wrong?

在多GPUs上训练出错

您好，请问一下我尝试用DataParallel把这个模型改成多GPU训练，在运行的时候报错了：

Traceback (most recent call last):
  File "/disks/disk1/remote_src/DocRED/GAIN-master/code/train.py", line 237, in <module>
    train(opt)
  File "/disks/disk1/remote_src/DocRED/GAIN-master/code/train.py", line 130, in train
    predictions = model(words=d['context_idxs'],
  File "/disks/disk1/envs/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/disks/disk1/envs/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/disks/disk1/envs/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/disks/disk1/envs/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
    output.reraise()
  File "/disks/disk1/envs/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
IndexError: Caughtin replica 0 on device 0.
Original Traceback (most recent call last):
  File "/disks/disk1/envs/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/disks/disk1/envs/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/disks/disk1/remote_src/DocRED/GAIN-master/code/models/GAIN.py", line 332, in forward
    encoder_output = encoder_outputs[i]  # [slen, bert_hid]
IndexError: index 1 is out of bounds for dimension 0 with size 1

关于数据`word2id.json`, `ner2id.json, rel2id.json, vec.npy 的问题

作者，你好~
请问我在你提出的Google Drive 中没有发现这几个数据文件，请问这些文件是需要自己生成吗？还是需要什么其它的操作？

头实体、尾实体下标问题

在path_table中，存储头实体和尾实体时已经将下标+1了；然后在h_t_pairs中，存储头实体和尾实体时也将下标+1了，按道理说这两个是匹配的，但是在模型的forward（）中又将h_t_pairs中的头尾实体+1和path_table中进行匹配，这个地方代码是不是存在错误？

cuda版本

您好，我在使用gpu的时候，产生的错误是OSError: libcublas.so.10: cannot open shared object file: No such file or directory。
我的cuda版本是11，请问一定需要10.2 的版本吗

Spatial features?

I was wondering if maybe it would be possible to apply the GAIN framework on document images.

Perhaps, could we include spatial features like the position (x, y) of each entity, along with other features that are used in the model like the text embeddings, entity type embeddings, etc.?

运行报错dgl._ffi.base.DGLError: Expect number of features to match number of nodes (len(u)). Got 195 and 196 instea

错误提示：dgl._ffi.base.DGLError: Expect number of features to match number of nodes (len(u)). Got 195 and 196 instead.
数据集是完全按照DocRED的格式的。

Training with custom data

Hello,

I have put my dataset in the DocRed format and also created the corresponding filestrain_annotated.json, dev.json, test.json, ner2id.json and rel2id.json to train a BERT type architecture. However, in my dataset, the number of relationships and entities is different than what is used in DocRed. I would like to know which files/parameters I would need to modify in order to be able to train with custom data.

Best regards

Size mismatch in training

I try to run the code through run_GAIN_BERT.sh, I get a size mismatch problem in GAIN.py when I try to extract features through edge_layer.

关于docred的数据的最大长度

你好，在使用bert的tokenizer时，我发现docred中部分数据获得的subtokens是超过512的，而如果超过512，在data.py中有如下处理方式，下列代码的意图是什么？

if entity2mention[idx] == []:
                    entity2mention[idx].append(mention_idx)
                    while mention_id[replace_i] != 0:
                        replace_i += 1
                    mention_id[replace_i] = mention_idx
                    pos_id[replace_i] = idx
                    ner_id[replace_i] = ner2id[vertex[0]['type']]
                    mention_idx += 1

Question about non-entity words.

As mentioned in Section 3.1 Encoding Module in the paper, it says "We introduce None entity type and id for those words not belonging to any entity".

However, the proposed model seems to not use any non-entity words.

In Mention-level Graph Aggregation Module, the graph only contains entities and not non-entities.

Thus, my question is that are non-entity words are just dropped from the graph's input or I overlooked some details about the model.

Thanks!

关于运行./eval_BERT.sh问题

您好！
1.我按照您的参数运行./eval_GAIN_BERT.sh 0 0.7972 时，结果出现如下情况：

这里的input_theta 0.7972是在运行./run_GAIN_BERT.sh 1的最佳epoch中的。请问为什么会出现这个问题呢？
2.是不是因为上面出错的原因，也没法得到rusult.json文件

there is no label in the test set，how to test（DocRED数据集）

关于其他数据集的修改:

我修改了自己的数据集使得其符合格式，然后修改'config.py'中的relation_nums为2,但是训练的时候仍然出错了。
错误报告如下

Traceback (most recent call last):
  File "train.py", line 194, in <module>
    train(opt)
  File "train.py", line 106, in train
    loss = torch.sum(BCE(predictions, relation_multi_label) * relation_mask.unsqueeze(2))/(opt.relation_nums * torch.sum(relation_mask))
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py", line 717, in forward
    reduction=self.reduction)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 2824, in binary_cross_entropy_with_logits
    raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
ValueError: Target size (torch.Size([1, 21, 97])) must be the same as input size (torch.Size([1, 21, 2]))
end

Test

Just for test.

Error in training

Hi! I'm trying to train the neural network, using the default values provided in the script run_GAIN_BERT.sh, but the training script is giving me an error after beginning. It seems to be an error with DGL The error is the following:

Question about text's length and use of other models

First of all, I would like to thank you for your great work! and paper.

I have been experimenting with the training scripts you have proposed and they have worked well.

So I would like to ask two questions.

Are there any limitations on the size of the sentences/documents that the model which uses the BERT Encoder could process? since BERT is limited to only 512 sub-words units.

And if I would like to experiment with other languages and therefore use other encoders, say bert-base-multilingual-cased or xlm-roberta-base, is it enough to create a folder for these models and download/place the files pytotch_model_bin, vocab.txt etc., accordingly?

I also imagine that it would be necessary to create a GAIN_BERT_MUL training script that points-out to the folder of the new model and modify the parameters as required.

Best regards

头实体、尾实体下标问题

Compatibility with CUDA 11

Hello, I am trying to train over a cuda 11.0 gpu card, I noticed then that cuda 11 its only compatible with dlg>=0.6.
Are there any workaround to train with dgl>=0.6.0 or I would love to hear some insights on how to adapt myself the code to train with the newest dlg version?

dgl_cu111-0.7.1 I use this dgl version and get train errors

Traceback (most recent call last):
File "train.py", line 231, in
train(opt)
File "train.py", line 125, in train
predictions = model(words=d['context_idxs'],
File "/home/anaconda3/envs/qusiyu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/qusiyu/GAIN-master/code/models/GAIN.py", line 304, in forward
encoder_outputs = torch.cat([encoder_outputs, self.entity_type_emb(params['entity_type'])], dim=-1)
TypeError: expected Tensor as element 0 in argument 0, but got str

得到result.json文件该如何理解他们的具体含义？

你们的运行配置是怎么样的呢（Running Configuration）

我想在这个结构的基础上跑一下实验，大家都是用几张卡跑的呢，大概训练了多久呢？

I would like to ask about your running configuration and training time

请问用GPU运行报错是怎么回事？CPU可以正常运行

配置好readme中要求的环境和各项依赖后，设置gpu_id为0（本机只有id为0的这一张显卡），运行run_GAIN_BERT.sh后，log文件中显示：”RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 6.00 GiB total capacity; 4.30 GiB already allocated; 10.44 MiB free; 4.38 GiB reserved in total by PyTorch)“，无法继续运行。
用nvidia-smi检查GPU占用情况，显示只有10%左右的占用，后台并未运行其它程序，显存资源充足。我的GPU是RTX2060，6G显存。
我使用的CUDA版本是10.2，torch是gpu版本的1.6.0，其它依赖版本也均和readme中相同，而用CPU跑就可以成功运行，请问是哪里出了问题？

关于代码的部分问题和总结

先说一下我的运行环境：
dgl 0.6.1 torch 1.8.0
这份代码貌似问题有点儿多啊！
（1）DGLREDataloader 类中的使用for循环对mapping的zero_()操作是否有点儿冗余，直接将外层声明tensor的语句放到for循环内部不就可以了吗？