Comments (17)
Sure, please checkout the following snipplet. There are 3 lines removed (commented out) and 5 lines added. After the modification, the performance is the same, but the memory requirement will drop significantly, and speed will increase. ([email protected]) + ([email protected]).transpose(0,1)
is logically equivalent to firstly repeat h
to two N-by-N-by-P tensor then reduce them to N-by-N.
Modified version of layers.py
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
class GraphAttentionLayer(nn.Module):
"""
Simple GAT layer, similar to https://arxiv.org/abs/1710.10903
"""
def __init__(self, in_features, out_features, dropout, alpha, concat=True):
super(GraphAttentionLayer, self).__init__()
self.dropout = dropout
self.in_features = in_features
self.out_features = out_features
self.alpha = alpha
self.concat = concat
self.W = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(in_features, out_features).type(torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor), gain=np.sqrt(2.0)), requires_grad=True)
#self.a = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(2*out_features, 1).type(torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor), gain=np.sqrt(2.0)), requires_grad=True)
self.a1 = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(out_features, 1).type(torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor), gain=np.sqrt(2.0)), requires_grad=True)
self.a2 = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(out_features, 1).type(torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor), gain=np.sqrt(2.0)), requires_grad=True)
self.leakyrelu = nn.LeakyReLU(self.alpha)
def forward(self, input, adj):
h = torch.mm(input, self.W)
N = h.size()[0]
# a_input = torch.cat([h.repeat(1, N).view(N * N, -1), h.repeat(N, 1)], dim=1).view(N, -1, 2 * self.out_features)
# e = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(2))
f_1 = h @ self.a1
f_2 = h @ self.a2
e = self.leakyrelu(f_1 + f_2.transpose(0,1))
zero_vec = -9e15*torch.ones_like(e)
attention = torch.where(adj > 0, e, zero_vec)
attention = F.softmax(attention, dim=1)
attention = F.dropout(attention, self.dropout, training=self.training)
h_prime = torch.matmul(attention, h)
if self.concat:
return F.elu(h_prime)
else:
return h_prime
def __repr__(self):
return self.__class__.__name__ + ' (' + str(self.in_features) + ' -> ' + str(self.out_features) + ')'
from pygat.
Hi,
I use the same preprocessing as in GCN (which is the same in GAT). You should compare with the implementation of the official GCN ;-)
For your second question, the difference is the attention mechanism: on similar_impl_tensorflow, the attention is implemented as in the official GAT, which being a simple feedforward neural network. The master one is a implementation which takes FxF as input, so you have all possible combination of input to compute the attention. Therefore the memory requirement is much bigger !
from pygat.
Thank you very much for the reply.
For the logic of attention in the master, would you please help me understand it? I would really appreciate it.
I checked the code in the master branch. It does what you said. But to me, the logic is the same as the original attention. Both of them just calculate the attention coefficient for each pair of nodes. The original tensorflow implementation calculate a1 x_i and a2 x_j first, then add them together. In the master branch, you replicate the nodes to make each pair of x_i and x_j, then compute a1 x_1 and a2 x_j. They should exactly have the same result. Since the former one just reuse the combination to avoid repeating the data.
I did notice the performance difference between the two implementations. But I can not understand why. If you could help me, I will really appreciate it.
Thank you very much!
Best,
Liyu
from pygat.
Hi,
I compared the data splitting with the one obtained by the function of GCN and GAT, they are different actually. First of all, the features matrix and adjacent matrix are different, which means the nodes have different order. Moreover, for the training set obtained by GCN and GAT, the labels are equally distributed, which means 20 training examples for each class. But the number of training examples obtained by your function utils.load_data() is not equally distributed.
Best,
Liyu
from pygat.
I'll have a check when I come back from holidays, this week-end ;-)
from pygat.
So,
First, for the split, the code is identical as in https://github.com/tkipf/pygcn/blob/master/pygcn/utils.py (besides the normalization function but it shouldn't change anything in terms of ids).
For your other question: at the beginning, I have implemented a general attention as in the paper (https://arxiv.org/pdf/1710.10903.pdf) Equation 1: a is a function FxF' -> R. I simply interpreted it as a mathematical function. In the paper, they specialize the attention a to a simple feed-forward neural network (which doesn't need as much memory as using the cartesian product). This is the difference between the 2 branches ;-) For me, this differs from https://github.com/Diego999/pyGAT/blob/similar_impl_tensorflow/layers.py#L39 which explains the difference in performance. By the way, the input "adj" is also different as the latter cannot use non-integer weights
from pygat.
Hi Diego,
Thank you for your patience. I just checked out the code from pygcn. It is indeed identical as your code here. However, it is different from https://github.com/tkipf/gcn/blob/master/gcn/utils.py
The latter use the original split from https://github.com/kimiyoung/planetoid/tree/master/data
The performance reported in the GAT paper is produced by the latter split too.
Obviously, the data splitting in pyGCN is different from kipf/gcn (tensorflow version). I guess the author of pyGCN wanted to reproduce the splitting but forgot the node correspondence issue.
For the attention coefficient questions. Actually, I modified the logic the that part back to the "similar_impl_tensorflow" branch. And the performance is the same after the modification. So we do not need to repeat the node features first. we can calculate aX first, then use broadcast to create the same NxN matrix.
Thanks for sharing the code. I also wanted to implement GAT with pytorch, and found your project. It helps me a lot.
Thanks,
Liyu
from pygat.
Hi,
Thank you for your investigation. You are indeed right as https://github.com/tkipf/gcn/blob/master/gcn/utils.py uses also another code to load the data.
I'll update the code next week.
Could you share your code about the logic here to for people being curious ?
from pygat.
Thank you for your answer. Therefore the implementation to the other branch. I was just curious 👍
from pygat.
So, what is @? It is so amazing!
from pygat.
"@" is the python matmul operator: https://legacy.python.org/dev/peps/pep-0465/
from pygat.
Very good solution!
from pygat.
Can anybody share the official setting of data splits?
from pygat.
data spliting is easy. You have to read paper for details.
from pygat.
I change the data split to the official GAT style, the accuracy on cora can only reach 0.817. Is there any solution to improve the performance
from pygat.
I change the data split to the official GAT style, the accuracy on cora can only reach 0.817. Is there any solution to improve the performance
Did you succeed in reproducing the results? Same issue I only can have around 81.7 on Cora and 71.x on Citeseer.
from pygat.
Sure, please checkout the following snipplet. There are 3 lines removed (commented out) and 5 lines added. After the modification, the performance is the same, but the memory requirement will drop significantly, and speed will increase. ([email protected]) + ([email protected]).transpose(0,1) is logically equivalent to firstly repeat h to two N-by-N-by-P tensor then reduce them to N-by-N.
Modified version of layers.py
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as Fclass GraphAttentionLayer(nn.Module):
"""
Simple GAT layer, similar to https://arxiv.org/abs/1710.10903
"""def __init__(self, in_features, out_features, dropout, alpha, concat=True): super(GraphAttentionLayer, self).__init__() self.dropout = dropout self.in_features = in_features self.out_features = out_features self.alpha = alpha self.concat = concat self.W = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(in_features, out_features).type(torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor), gain=np.sqrt(2.0)), requires_grad=True) #self.a = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(2*out_features, 1).type(torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor), gain=np.sqrt(2.0)), requires_grad=True) self.a1 = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(out_features, 1).type(torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor), gain=np.sqrt(2.0)), requires_grad=True) self.a2 = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(out_features, 1).type(torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor), gain=np.sqrt(2.0)), requires_grad=True) self.leakyrelu = nn.LeakyReLU(self.alpha) def forward(self, input, adj): h = torch.mm(input, self.W) N = h.size()[0] # a_input = torch.cat([h.repeat(1, N).view(N * N, -1), h.repeat(N, 1)], dim=1).view(N, -1, 2 * self.out_features) # e = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(2)) f_1 = h @ self.a1 f_2 = h @ self.a2 e = self.leakyrelu(f_1 + f_2.transpose(0,1)) zero_vec = -9e15*torch.ones_like(e) attention = torch.where(adj > 0, e, zero_vec) attention = F.softmax(attention, dim=1) attention = F.dropout(attention, self.dropout, training=self.training) h_prime = torch.matmul(attention, h) if self.concat: return F.elu(h_prime) else: return h_prime def __repr__(self): return self.__class__.__name__ + ' (' + str(self.in_features) + ' -> ' + str(self.out_features) + ')'
Same question as above.
from pygat.
Related Issues (20)
- How are in_features and out_features defined in SpgraphAttentionLayer? HOT 1
- according to the definition of softmax this line maybe wrong HOT 1
- Model instability
- How to implement batch training? HOT 24
- How to implement the GAT model to a regression problem? **Particularly the design of labels**
- Getting this error!
- transform to other scope dataset
- How to visualize the learned Attention?
- The result score is acc=84.6
- The bias is not necessary?
- code question HOT 1
- Why use plural title?
- Expected object of type torch.cuda.LongTensor but found type torch.LongTensor for argument #1 'indices'
- Parameter containing nan
- can GAT convert to caffemodel?
- runtime error HOT 1
- About DataParallel , multi gpu
- Why batch training? HOT 2
- error HOT 1
- How to apply this model to extract graph features from multiple graphs? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pygat.