bangliu / articlepairmatching Goto Github PK

View Code? Open in Web Editor NEW

234.0 234.0 60.0 2.05 MB

The code of ACL 2019 paper: Matching Article Pairs with Graphical Decomposition and Convolutions

License: Other

Python 98.43% Shell 1.57%

articlepairmatching's People

Contributors

Stargazers

Watchers

Forkers

zpli0320 lu839684437 zheng5yu9 awyshw yyht xianhuaxizi zhhhzhang xsilverbullet typistzhao cdj0311 colinsongf shasiki chunlinx sduchh helloleejq barryzm mformargarite zx-feishang two222 mysoulmq qianrenjian eternalfeather zuiwufenghua crystalxiaoxiao renz7 hyusheng jialeguo kiminh ammieqi tianhaofu vxacezxcv dancinghui qfxlcyc astrodrew canyuchen yangliuy zhp510730568 marissala berlin-98 stillyuyi wangyujie1176 gg520 njust-taoye cytsinghua laoma023012 nnroy akakaala eecrazy amoto1103 hongzhangmu liny008 guofeng201507 weichengtt isperfee fusion-research shruti-singh winghigh lawrencefc

articlepairmatching's Issues

数据集问题

您好，想问一下数据集标注的标准是什么，以下的标注是否会影响结果？
论文提到对于event_pair 是描述同一件事，而story_pair是一个一些有关联的事件（比如一些话题）。
标注的same_story_doc_pair数据集中，
一些相关事件并未标注相关，如：
0|9501|14721|“ 出轨后 ” 的宋喆买豪宅母亲背名牌包马蓉这边却惨不忍睹|
马蓉出轨宋喆后一直没露面，这次终于要露面了|
0|10706|10751|详解特朗普就职典礼全程安排具有多个看点|
名流大腕拒绝出席总统就职典礼特朗普：我想要人民|
0|14176|14297|" 台风 "" 海马 "" 本周或带来严重风雨影响 "|
“ 海马 ” 或直扑闽粤 19日至 23日将带来严重的风雨影响|
一些不相关事件标注为相关，如：
1|13109|13110|肇庆这部分路段封闭施工 , 车主请绕行 !|
肇庆打掉一特大贩毒犯罪团伙缴毒 6000 多克|

请问可以放出完整版的代码吗？

想问下aggregation layer那里的term based similarity以及Bert做encode层部分代码会放出来吗？
另外就是这里没考虑用pytorch的自带dataloader作为数据加载吗，这样整个模型的Batchsize这部分不好调整

运行feature_extractor.py报错

运行feature_extractor.py报错
Traceback (most recent call last):
File "feature_extractor.py", line 9, in
from ccig import *
File "/home/ubuntu/Desktop/CIG-GCN/ArticlePairMatching-master/src/models/CCIG/data/ccig.py", line 13, in
IDF = load_IDF("event_story")
File "/home/ubuntu/Desktop/CIG-GCN/ArticlePairMatching-master/src/models/CCIG/data/resource_loader.py", line 18, in load_IDF
"|", "|", keep_header=False)
File "/home/ubuntu/Desktop/CIG-GCN/ArticlePairMatching-master/src/models/CCIG/util/pd_utils.py", line 11, in export_columns
df = pd.read_csv(fin, sep=sep_in)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in init
self._make_engine(self.engine)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/ubuntu/anaconda3/envs/pytorch/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in init
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.cinit
File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] No such file or directory: '../../../../data/raw/event-story-cluster/event_story_cluster.txt'
请问这个不存在的文件在哪里呢？是需要生成的吗？

数据集下载问题

您好，我在git clone时遇到如下问题：
Error downloading object: data/raw/event-story-cluster/same_event_doc_pair.txt (e5c2482): Smudge error: Error downloading data/raw/event-story-cluster/same_event_doc_pair.txt (e5c2482c410f19418256839d7158d18244e9630466f262b064fb5a69e6f7dddc): batch response: Post https://github.com/BangLiu/ArticlePairMatching.git/info/lfs/objects/batch: dial tcp: lookup github.com: no such host
请问如何解决？

How to download the dataset?

测试时间问题

您好，请问完成测试集数据大概需要多久时间？

How to generate the document pair dataset?

I want to ask about how to assign document pair to label ?
Crowdsourcing or other ways?

about draw_graph

您好，我在使用graph_tool中的draw_graph画图的时候，画出来的图像没法显示中文，

我看您的code中用了vertex_font_family="STKaiti"，请问是因为我graph_tool版本的问题吗？

使用anconda在ubuntu下安装graph-tool

$ conda config --add channels conda-forge
$ conda config --add channels ostrokach-forge
$ conda install graph-tool

求解为啥安装graph-tool时，老是提示boost安装错误

Question about Time spent on training test dataset

HI ，my dear friends，The data generation program has been running for more than 36 hours, unfortunately, I still don't get the output file：/same_event_doc_pair.cd.json

I would be very grateful if you guys give some useful experience

关于graph-tool的问题

运行feature_extractor.py时出现
python: symbol lookup error: /home/ubuntu/anaconda3/envs/pytorch/lib/python3.6/site-packages/graph_tool/draw/libgraph_tool_draw.so: undefined symbol: _ZN5Cairo7Context16select_font_faceERKSsNS_9FontSlantENS_10FontWeightE
请问这个如何解决？

有关实际使用的问题

假如我现在要使用该模型做文章搜索的功能，搜索相似的文章。首先通过一些TextRank、Ner的等模块提取了特征，然后是不是要和已有库中的所有文章都调用一次模型，这样的效率是不是太慢了

使用git clone时出现问题

你好，我正在使用git-lfs克隆仓库以期获得数据集，但是在克隆时出现一些问题。请问您能不能将数据集以别的方式发布一下呢，谢谢。

关于feature_extractor.py的main中多次调用dataset2featurefile方法的问题

作者您好，在feature_extractor.py中有如下语句
if name == "main":
#debug with a few lines
dataset2featurefile(
"../../../../data/raw/event-story-cluster/same_event_doc_pair.txt",
"../../../../data/processed/event-story-cluster/same_event_doc_pair.cd.debug.json",
"label", "category1", "time1", "time2", "content1", "content2",
["keywords1", "ner_keywords1"], ["keywords2", "ner_keywords2"],
col_title1=None, col_title2=None, use_cd=True,
draw_fig=True, parallel=False, extract_range=range(2), print_fig=True)

# process data
dataset2featurefile(
    "../../../../data/raw/event-story-cluster/same_event_doc_pair.txt",
    "../../../../data/processed/event-story-cluster/same_event_doc_pair.cd.json",
    "label", "category1", "time1", "time2", "content1", "content2",
    ["keywords1", "ner_keywords1"], ["keywords2", "ner_keywords2"],
    col_title1="title1", col_title2="title2", use_cd=True,
    draw_fig=False, parallel=True, extract_range=None,
    betweenness_threshold_coef=1.0, max_c_size=6, min_c_size=2)
dataset2featurefile(
    "../../../../data/raw/event-story-cluster/same_story_doc_pair.txt",
    "../../../../data/processed/event-story-cluster/same_story_doc_pair.cd.json",
    "label", "category1", "time1", "time2", "content1", "content2",
    ["keywords1", "ner_keywords1"], ["keywords2", "ner_keywords2"],
    col_title1="title1", col_title2="title2", use_cd=True,
    draw_fig=False, parallel=True, extract_range=None,
    betweenness_threshold_coef=1.0, max_c_size=6, min_c_size=2)
dataset2featurefile(
    "../../../../data/raw/event-story-cluster/same_event_doc_pair.txt",
    "../../../../data/processed/event-story-cluster/same_event_doc_pair.no_cd.json",
    "label", "category1", "time1", "time2", "content1", "content2",
    ["keywords1", "ner_keywords1"], ["keywords2", "ner_keywords2"],
    col_title1="title1", col_title2="title2", use_cd=False,
    draw_fig=False, parallel=True, extract_range=None,
    betweenness_threshold_coef=1.0, max_c_size=6, min_c_size=2)
dataset2featurefile(
    "../../../../data/raw/event-story-cluster/same_story_doc_pair.txt",
    "../../../../data/processed/event-story-cluster/same_story_doc_pair.no_cd.json",
    "label", "category1", "time1", "time2", "content1", "content2",
    ["keywords1", "ner_keywords1"], ["keywords2", "ner_keywords2"],
    col_title1="title1", col_title2="title2", use_cd=False,
    draw_fig=False, parallel=True, extract_range=None,
    betweenness_threshold_coef=1.0, max_c_size=6, min_c_size=2)`

其中多次调用了dataset2featurefile这一方法，除第一个参数extract_range设置为range(2)，其余后后面几次都是一样的；请问这样做是否是必要的，实际运行时只保留其中一次调用可以吗，如果是，保留extract_range=range(2)的，还是extract_range=None的呢？谢谢您！

ArticlePairMatching/src/models/CCIG/data/resource_loader.py

Hi, in resource_loader.py, 'event_story_cluster.txt' is not provided and I don't know how this file generate.

RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method.

Hi, friends, I meet a problem when I run
"python main.py --data_type "event" --use_vfeatures --use_siamese --use_gfeatures --use_gcn --use_cd"
the error as follow:

Traceback (most recent call last):
  File "main.py", line 297, in <module>
    train(args, fout)
  File "main.py", line 248, in train
    step = train_epoch(epoch, step)
  File "main.py", line 190, in train_epoch
    output = model(w2v_idxs_l, w2v_idxs_r, v_feature, adj, g_feature, g_vertice)  # what if batch > 1 ?
  File "//users10/yhwu/miniconda/envs/match_article/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/users10/yhwu/Project/match_article/src/models/CCIG/models/se_gcn.py", line 124, in forward
    x_siamese = self.gc_w2v[n_l](x_siamese, adj)
  File "//users10/yhwu/miniconda/envs/match_article/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/users10/yhwu/Project/match_article/src/models/CCIG/models/layers.py", line 60, in forward
    output = SparseMM()(adj, support)
  File "//users10/yhwu/miniconda/envs/match_article/lib/python3.8/site-packages/torch/autograd/function.py", line 159, in __call__
    raise RuntimeError(
RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)

By searching online, I find that this problem may due to the version of pytorch( when pytorch version >1.3, torch.autograd.Function need to be static instead of non-static). So I try to solve this problem by changing class"SparseMM" from non-static to static, My modified results are as follows：

class SparseMM(torch.autograd.Function):
    """
    Sparse x dense matrix multiplication with autograd support.

    Implementation by Soumith Chintala:
    https://discuss.pytorch.org/t/
    does-pytorch-support-autograd-on-sparse-matrix/6156/7
    """

    @staticmethod
    def forward(ctx, matrix1, matrix2):
        ctx.save_for_backward(matrix1, matrix2)
        return torch.mm(matrix1, matrix2)

    @staticmethod
    def backward(ctx, grad_output):
        matrix1, matrix2 = ctx.saved_tensors
        grad_matrix1 = grad_matrix2 = None

        if ctx.needs_input_grad[0]:
            grad_matrix1 = torch.mm(grad_output, matrix2.t())

        if ctx.needs_input_grad[1]:
            grad_matrix2 = torch.mm(matrix1.t(), grad_output)

        return grad_matrix1, grad_matrix2

However, the problem has not been solved and the error prompt has not changed.
I don't know what to do next.（Unless I change the pytorch version） I hope to get the help of the author and everyone. Thanks a lot!

No such file or directory

def load_IDF(data):
if data == "event_story":
datafile = "../../../../data/raw/event-story-cluster/event_story_cluster.txt"
contentfile = "../../../../data/processed/event-story-cluster/content.txt"
idffile = "../../../../data/processed/event-story-cluster/IDF.txt"

I meet mistakes : these three txt no such file or directory

Solution to install "graph_tool" on Ubuntu 16.04

As mentioned before #2, installing "graph_tool" may be very troublesome.

Here is my solution to install "graph_tool" on my ubuntu 16.04, hoping can be helpful to those still using ubuntu 16.04 as the server.

We could go to the Official Installation Guidance for debian&ubuntu, but only ubuntu 18.04(bionic), ubuntu 18.10(cosmic) and ubuntu 19.04(disco) are listed blow the instructions.

I tried to open the source url https://downloads.skewed.de/apt/ in my browser, and found there was still a folder "xenial" which is for ubuntu 16.04.

So, I just replace the DISTRIBUTION with xenial in the following lines, and added them to /etc/apt/sources.list.

deb http://downloads.skewed.de/apt/DISTRIBUTION DISTRIBUTION universe
deb-src http://downloads.skewed.de/apt/DISTRIBUTION DISTRIBUTION universe

Then follow the official guidance, I finally installed it successfully.

Note: The code may encounter a “Error”when import cairo, so the graph drawing will not work. But it can work normally if we don't use it for visualization.

请问数据处理feature_extractor.py要多久，怎么我运行了三天一直是这个界面？

FileNotFoundError: [Errno 2] File b'../../../../data/raw/event-story-cluster/event_story_cluster.txt' does not exist

KeyError: ')'

begin loading DATA............../../../data/processed/event-story-cluster/same_event_doc_pair.cd.json
Traceback (most recent call last):
File "main.py", line 102, in
labels, idx_train, idx_val, idx_test = load_graph_data(path, word_to_ix, MAX_LEN, args.num_data)
File "/home/wting/Documents/code/ArticlePairMatching-master/src/models/CCIG/loader.py", line 108, in load_graph_data
sent_idx = right_pad_zeros_1d([word_to_ix[w.decode("utf-8")] for w in val], max_len)
File "/home/wting/Documents/code/ArticlePairMatching-master/src/models/CCIG/loader.py", line 108, in
sent_idx = right_pad_zeros_1d([word_to_ix[w.decode("utf-8")] for w in val], max_len)
KeyError: ')'
device is cpu

I meet the keyerror problem, and found the json contains " "v_texts_mat": [[") 之所以说圆满，是因为这是各方都能接受的方案笔者认为，这也算另外一种意义的混改吧", ""], ["对于王石为首的万科来说，"
I wander why the json failed, thank you for you explainations

File "C:\Users\username\Downloads\ArticlePairMatching-master\src\models\CCIG\data\resource_loader.py", line 53, in load_W2V_VOCAB W2V_VOCAB = pickle.load(open(W2V_VOCAB_PKL_FILE, 'rb')) UnpicklingError: invalid load key, 'v'.

I get this error.

AssertionError

WARNING **: 01:45:59.539: Failed to load shared library 'libgdk-3.so.0' referenced by the typelib: libXcursor.so.1: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "feature_extractor.py", line 8, in
from graph_tool.all import *
File "/opt/conda/lib/python3.6/site-packages/graph_tool/all.py", line 34, in
from graph_tool.draw import *
File "/opt/conda/lib/python3.6/site-packages/graph_tool/draw/init.py", line 835, in
from . cairo_draw import graph_draw, cairo_draw,
File "/opt/conda/lib/python3.6/site-packages/graph_tool/draw/cairo_draw.py", line 1496, in
from gi.repository import Gtk, Gdk, GdkPixbuf
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 656, in _load_unlocked
File "", line 626, in _load_backward_compatible
File "/opt/conda/lib/python3.6/site-packages/gi/importer.py", line 144, in load_module
importlib.import_module('gi.repository.' + dep.split("-")[0])
File "/opt/conda/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 656, in _load_unlocked
File "", line 626, in _load_backward_compatible
File "/opt/conda/lib/python3.6/site-packages/gi/importer.py", line 145, in load_module
dynamic_module = load_overrides(introspection_module)
File "/opt/conda/lib/python3.6/site-packages/gi/overrides/init.py", line 118, in load_overrides
override_mod = importlib.import_module(override_package_name)
File "/opt/conda/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/opt/conda/lib/python3.6/site-packages/gi/overrides/Gdk.py", line 83, in
Color = override(Color)
File "/opt/conda/lib/python3.6/site-packages/gi/overrides/init.py", line 195, in override
assert g_type != TYPE_NONE
AssertionError

KeyError: 'content1'

English

Is there similar work in English?
Is this dataset available for English as well?

数据预处理要好久啊，能不能发给处理好的数据链接

请问训练数据中如何生成实体识别的数据（ner）的？关键词提取是你们自己训练的textrank模型生成的吗？

Is there any way to increase the speed of preprocessing?

Dr. Liu, Thanks for you great work!

When I try to apply it to an application, I found it's quite slow and not suitable for calculating the similarity of massive sentence pairs(it takes around 6 hrs to process 30k sentence pairs with use_cd=False on i7-8700K CPU).

I'll appreciate it if you can provide a way to accelerate this progress.

How to get ner word by textrank?

Hi ! I love your work ! I want to use it on other dataset, but i don't know how you get the ner words. Would you tell me ? Thanks!
In the papaer, it said "Given a document D, we ﬁrst extract the named entities and keywords by TextRank".
Did you use TextRank to get keywords, and then match whether the key words in your own word-type dictionary?
e.g. 广东 in your dictionary is "广东-site".So when you get "广东" is a keyword, you search it and find it is a "site"? Or you use other ner tools? I need a good and fast ner method...Can you tell me about that?
Thanks for listen to me.

No such file or directory: '../../../data/processed/event-story-cluster/same_event_doc_pair.no_cd.json'

ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/ArticlePairMatching/src/models/CCIG$ python3.6 main.py --data_type "event" --use_gfeatures
device is cpu
begin loading W2V............
Company
W2V loaded!
Vocab size: 2, Embedding size: 200
Namespace(adjacent='tfidf', beta1=0.8, beta2=0.999, betweenness_threshold_coef=1.0, combine_type='separate', data_type='event', dropout_siamese=0.1, dropout_vfeat=0.1, ema_decay=0.9999, epochs=10, gcn_type='valina', gfeatures_type='features', hidden_final=16, hidden_siamese=128, hidden_vfeat=16, inputdata='event-story-cluster/same_event_doc_pair.no_cd.json', lr=0.001, lr_warm_up_num=1000, max_c_size=6, max_grad_norm=5.0, min_c_size=2, no_cuda=False, no_grad_clip=False, num_data=1000000000, num_gcn_layers=2, outputresult='event-story-cluster/same_event_doc_pair.no_cd.result.txt', pool_type='mean', seed=42, use_cd=False, use_ema=False, use_gcn=False, use_gfeatures=True, use_siamese=False, use_vfeatures=False, vertice='pagerank')
begin loading DATA............../../../data/processed/event-story-cluster/same_event_doc_pair.no_cd.json
Traceback (most recent call last):
File "main.py", line 102, in
labels, idx_train, idx_val, idx_test = load_graph_data(path, word_to_ix, MAX_LEN, args.num_data)
File "/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/ArticlePairMatching/src/models/CCIG/loader.py", line 81, in load_graph_data
fin = open(path, "r", encoding="utf-8")
FileNotFoundError: [Errno 2] No such file or directory: '../../../data/processed/event-story-cluster/same_event_doc_pair.no_cd.json'
ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/ArticlePairMatching/src/models/CCIG$

graph_tool problem

can you give me the version of graph-tool and torch

how bert do text pair matching for long document

Hi, BangLiu,
I use Bert to fine-tuning, extract two long document pair first 256 words, feed them bert, output is feed classifcation layer;
My model batch_size is 6, epoch 2; but the result of train is accuracy = 0.5519683, global_step = 517, loss = 0.72958165, precision = 0.49228394, recall = 0.2029262; The differences with papers's result is very big.
So, I want to know how do you do use bert-text-matching?

File "/home/username/ArticlePairMatching/src/models/CCIG/data/ccig.py", line 30, in draw_ccig pos = gt.sfdp_layout(g) AttributeError: module 'graph_tool' has no attribute 'sfdp_layout'

I have changed this

pos = sfdp_layout(g)
 graph_draw(g, pos=pos,
           vertex_text=g.vertex_properties["name"],
           vertex_fill_color=c,
           vertex_font_family="STKaiti",
           vertex_font_size=18,
           edge_font_family="STKaiti",
           edge_font_size=10,
           edge_text=g.edge_properties["name"],
           output_size=(1000, 1000),
           output=fig_name)

to this

pos = gt.sfdp_layout(g)
gt.graph_draw(g, pos=pos,
           vertex_text=g.vertex_properties["name"],
           vertex_fill_color=c,
           vertex_font_family="STKaiti",
           vertex_font_size=18,
           edge_font_family="STKaiti",
           edge_font_size=10,
           edge_text=g.edge_properties["name"],
           output_size=(1000, 1000),
           output=fig_name)

Even though I have everything imported from graph_tool, I face this error. I also imported graph_tool as gt.
Kindly help me with this

Can't find "same_event_doc_pair.no_cd.json". I have used 'git lfs clone' to clone the project.

Traceback (most recent call last):
File "main.py", line 102, in
labels, idx_train, idx_val, idx_test = load_graph_data(path, word_to_ix, MAX_LEN, args.num_data)
File "/content/ArticlePairMatching/src/models/CCIG/loader.py", line 81, in load_graph_data
fin = open(path, "r", encoding="utf-8")
FileNotFoundError: [Errno 2] No such file or directory: '../../../data/processed/event-story-cluster/same_event_doc_pair.no_cd.json'

请问对于英文数据集，用什么工具分词，分词用空格还是用什么隔开；特征和属性用什么方法抽取

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x5 and 0x16)。有朋友遇到这个问题吗，维度不一致

regressor.0.weight : torch.Size([16, 0]) 0
regressor.0.bias : torch.Size([16]) 16
regressor.2.weight : torch.Size([1, 16]) 16
File "H:\EXPERIMENTS\ArticlePairMatching-master\src\models\CCIG\models\se_gcn.py", line 151, in forward
out = self.regressor(x)
File "C:\Users\ediso\anaconda3\envs\articlepair\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\ediso\anaconda3\envs\articlepair\lib\site-packages\torch\nn\modules\container.py", line 141, in forward
input = module(input)
File "C:\Users\ediso\anaconda3\envs\articlepair\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\ediso\anaconda3\envs\articlepair\lib\site-packages\torch\nn\modules\linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "C:\Users\ediso\anaconda3\envs\articlepair\lib\site-packages\torch\nn\functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x5 and 0x16)

请问你们使用的是什么分词工具？

Question about Encoding Local Matching Vectors

As proposed in your paper, you generated local matching vectors in each concept, but what if the concept only contains sentences from one document, do you just ignore these concepts?