Comments (8)
M is a hyper-parameter, you can set M = 2, 3, 4.... for different tasks.
from aggcn.
ok,thank you,could you tell me about the relationship between the 'n' of matrixA(n×n) and hyper-parameter 'N'? Are their values equal?
from aggcn.
The hyper-parameter N indicates the number of attention heads.
For example, if you used 3 heads (N=3), 3 attention matrices will be generated. Each matrix has the size n x n, where n is the length of the sentence (number of tokens).
from aggcn.
thank you very much ,i got it.and sorry, i have some other questions.
the first, Why use sublayers here in the GCN(sublayer_first=2,sublayer_second=4)?
the second,How is the heads decided, why is it 3? How would it choose which nodes in the sentence as head nodes?
from aggcn.
For the first question, someone had a similar one before as here: #2
For the second question, the number of heads is a hyper-parameter. It is not related to head nodes. Instead, it is a terminology used in the multi-head attention mechanism. Please refer to the paper: Attention is all you need
from aggcn.
Ok,thanks for your patient.
and in the code:
aggcn.py about definition of the "class MultiHeadAttention"
There are only Query and key ,where is the definition of "Value"?
def forward(self, query, key, mask=None):#传入的数据(q,k,mask)
if mask is not None:
mask = mask.unsqueeze(1)
nbatches = query.size(0)
query, key = [l(x).view(nbatches, -1, self.h, self.d_k).transpose(1, 2)
for l, x in zip(self.linears, (query, key))]
# query = query.view(nbatches, -1, self.h, self.d_k).transpose(1, 2)
# key = key.view(nbatches, -1, self.h, self.d_k).transpose(1, 2)
attn = attention(query, key, mask=mask, dropout=self.dropout)
return attn
from aggcn.
The reason is that we just need the attention matrix, which is treated as the adjacency matrix. GCN requires the adjacency matrix as the input. The key motivation of our paper is to leverage the multi-head attention mechanism to learn the adjacency matrix rather than directly derived from the dependency tree.
I suggest you go through our paper and the related references carefully. I won't be able to answer every detailed question here.
from aggcn.
Thank you very much
from aggcn.
Related Issues (20)
- some error about:"RuntimeError: cuda runtime error (100) : " HOT 4
- Why the number of classes on the SemEval2010-Task 8 is only 10? HOT 2
- some question about your paper HOT 2
- 请问代码中的mlp output layer 是用来干嘛的 HOT 1
- RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at ..\src\THC\THCGeneral.cpp:70 HOT 2
- About replacing data sets HOT 9
- I found an error: train.py: error: argument --id: expected one argument
- RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /tmp/pip-req-build-ufslq_a9/aten/src/THC/THCGeneral.cpp:50 HOT 1
- how to get standford_head and stanford_deprel for cross-sentence data HOT 3
- F1=0 HOT 1
- the Final Score HOT 2
- how to preprocess the dataset HOT 5
- 关于AGGCN模型细节的问题 HOT 1
- Why did you configure the first densely connected layer with GraphConvLayer? HOT 1
- What is the function of tensor "denom" ? HOT 3
- environment error
- How to test the n-ary relation extraction part of the experiment?
- 适用于中文数据集吗? HOT 3
- 关于模型中的M个block的问题
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aggcn.