diego999 / pygat Goto Github PK
View Code? Open in Web Editor NEWPytorch implementation of the Graph Attention Network model by Veličković et. al (2017, https://arxiv.org/abs/1710.10903)
License: MIT License
Pytorch implementation of the Graph Attention Network model by Veličković et. al (2017, https://arxiv.org/abs/1710.10903)
License: MIT License
I really appreciate about sharing your code, Diego999.
It is so nice to understand and to use.
But when I tested GAT layer on batched data, the error occur because of dimension.
Did you test this code on batched data?
I just wonder if it is happening only to me.
--------def forward(self, input, adj):
---------->h = torch.mm(input, self.W)
-----------N = h.size()[0]
RuntimeError: matrices expected, got 3D, 2D tensors at /pytorch/aten/src/TH/generic/THTensorMath.c:2028
In 56 line of layesr.py, using repeat_interleave() function .But this function was introduced in pytorch 1.1.And if you use pytorch1.1, you should modify the compute_test() function which in 105 line of train.py. Replace data[0] with item()
I see you only implemented two layer's GAT, the accuracy of your model will lower than the official implementation, right? I want to do my experiment based on your code, and can you give me some advice on the relationships between the number of layers and the final accuracy or F1 score?
Hi, Diego
Your code is very useful and easy to use. I want to reproduce its performance on the other datasets such as Citeseer. Could you mind tell me how to download and process those datasets?
Thanks,
Hao
Lines 31 to 32 in 808e0ff
It seems that these 2 lines are implementing the argument of LeakyReLU in Equation (3) from the paper page 3.
But I don't know how it works. Can you explain it?
Also, I'm getting memory issues on these two lines when trying with other data.
Thanks,
@derek-saal
@Diego999
hello, when i use this code to run my datasets, my data has 10242 nodes.
the error occurs:
RuntimeError: CUDA out of memory. Tried to allocate 6.25 GiB (GPU 0; 31.75 GiB total capacity; 25.01 GiB already allocated; 4.51 GiB free; 1.18 GiB cached)
do you know how to solve it?
Someone says reduce the batch-size, But i haven't found any batch-size.
hello, when I use the following code to output the label:
pred= model(features, adj) output=pred.cuda().data.cpu().numpy() label_dict = {0:"0",1:"1",2:"2",3:"3",4:"4",5:"5",6:"6"} # 定义标签颜色字典 with open("./embeddings.txt", "w") as fe, open("./labels.txt", 'w') as fl: for i in range(len(output)): fl.write(label_dict[int(list(output[i]).index(1.))]+"\n")
the error message occurs:
fl.write(label_dict[int(list(output[i]).index(1.))]+"\n")
ValueError: 1.0 is not in list
the output:
output: [[-1.847805 -1.6629431 -2.0786197 ... -1.9947618 -2.0082192 -2.061518 ]
[-1.9058554 -1.8536501 -1.9879545 ... -1.942745 -1.9345684 -1.9945679]
[-1.9354665 -1.8630134 -2.0195217 ... -1.8244724 -1.9596565 -2.0232475]
...
[-1.9140366 -1.8670493 -2.0020766 ... -1.9355989 -1.9930208 -2.0063093]
[-1.936709 -1.8693893 -1.9146458 ... -1.9425946 -1.9571905 -2.0101976]
[-1.8989682 -1.7643857 -2.0006032 ... -1.9852941 -1.9770277 -2.0285738]] (2708, 7) float32 <class 'numpy.ndarray'>
I just want to ask, each node has positive and negative probabilities for 7 categories, so which number is the category with the highest probability?
Hi,
I would like to try this model out in a regression scenario, what sould be changed in order to do so? Is using a different loss function (one for regression) enough?
Thanks.
In official documents, there is a notice 'Each function object is meant to be used only once (in the forward pass).' in subclass of torch.autograd.Function.
In model SpGraphAttentionLayer, you have use the object of SpecialSpmmFunction(self.special_spmm) twice, one for e_rowsum and one for h_prime.
Is it the right usage for subclass of torch.autograd.Function?
hello, Diego
I would like to ask, what method did you use to visualize the result in the end?
Do you mind sharing the source code of the visualization?
Looking forward to your reply
Hi,
I found there is an operation to build the symmetric adjacency matrix as adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)
I have no ideas about what adj.T>adj
means. Would you mind having some words on it?
Thanks!
As explained in https://en.wikipedia.org/wiki/Laplacian_matrix, Symmetric normalized Laplacian is I-D^(-1/2)AD^(-1/2). However, the function normalize_adj() in utils.py seems to only perform D^(-1/2)AD^(-1/2).
hello
Because my ability is limited, so I will ask you again. How to output labels updated after this algorithm? I only saw lbl_in(1,2708,7) and lab_resh(2708,7)
If the batch size is added, what is the process like? Can you give me an example? I have been producing the program for several days but I still haven't produced it
Hello!I used the GAT network to run the citeseer database, but the accuracy could not reach 72.5, only 70.3. How did you set the parameters to run so high?
Hi,
Thanks for sharing your codes in Pytorch for us.
I note that in your paper, you also conduct the inductive learning setting.
While the released codes do not support the inductive setting.
Would you like to release the codes that support the inductive setting? Or give us some advice?
Thanks
Best Regards.
Xu.
Hi, I am trying to do an implementation which requires to input multiple graphs for node classification task. All the examples I've seen so far was for graph classification for this case. Although I've seen building block diagonal adjacency matrix, I'm not sure if it is for graph classification or node. Also I didn't understand whether should I create a block diagonal matrix with feature matrix and labels too or not.
Let's suppose I have 20 different graphs(with different number of nodes, edges, features). And each node of every graph is labeled.
All the nodes in first 10 graphs are for training, all the nodes of the next 5 graphs are for val, and all the nodes of last 5 graphs are for test. What I'm trying to do is predicting labels of the nodes for the graphs in test-set. How can I input multiple graphs into GAT(or any other GNN) with these conditions for node classification task(not for graph classification).? If the solution is block diagonal adj. matrix, should I do the same for labels and feature matrix too?
Hi Diego,
Thank you so much for sharing your code! But I am confused with adj matrix.
In layers.py "attention = torch.where(adj > 0, e, zero_vec)", it seems that adj(normalized) just used as an adjacency matrix which shows whether there is an edge between two nodes. If it is true, why it should be normalized? From my understanding, adj(normalized)>0 and adj(unnormalized)>0 are the same.
Hope for your reply.
Thx!
Hi,
thank you for your code. I have noticed that you use the concatenation, while the paper uses the average method in the multi-head attention mechanism because the authors think concatenation is no longer sensible. What's the difference between the two methods?
hello,thanks for your work!Now I have some problems.It's that i have 2000 training samples,each sample consists of 8 different feature vectors representing eight different objects.So, i want to construct a fully linked graph consists of 8 nodes. And my goal is to classify nodes.My question is that how should i implement my thought in this code?OR my thought is wrong?
I hope you can give me some advice. Thank you.
Hi,
It looks like the code is processing a single adjacency matrix that includes information about connections between graph nodes in training, development and testing sets. I was wondering where in the code the adjacency matrix is masked such that it is not using the parts that reflect the training set during testing?
Many thanks,
Elena
Loading cora dataset...
Traceback (most recent call last):
File "train.py", line 105, in <module>
loss_values.append(train(epoch))
File "train.py", line 79, in train
'loss_train: {:.4f}'.format(loss_train.data[0]),
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
I am curious about the 1 minute running time version, so I want to get this branch working.
Save architecture but their results number has such a huge gap.
Any suggestions?
Thanks Diego for sharing the code.
I have a question about run the code in parallel, since the multi-head attention mechanism should be able run in parallel efficiently.
But I can't figure out how to make it?
Can anyone help, many thanks.
I switched the dataset to citeseer, and trained for 100 epochs, the 99th epoch training log is:
Epoch: 0099 loss_train: 0.9922 acc_train: 0.7000 loss_val: 1.0769 acc_val: 0.7100 time: 38.2587s
However, the finial test result is:
Test set results: loss= 1.2956 accuracy= 0.5350
The accuracy is pretty low, not like the 72% shown in the paper
How can we use the codes for other datasets with batch sizes bigger than 1?
i write a simple test.py which is same as the code at the end of train.py. it loads *.pkl successfully, but doesn't work well. the accuracy is only 0.4530(or even worser, such as 0.02).
here is code.
from __future__ import division
from __future__ import print_function
import os
import glob
import time
import random
import argparse
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from utils import load_data, accuracy
from models import GAT, GATest, SpGAT
parser = argparse.ArgumentParser()
parser.add_argument('--no-cuda', action='store_true', default=False, help='Disables CUDA training.')
parser.add_argument('--fastmode', action='store_true', default=False, help='Validate during training pass.')
parser.add_argument('--sparse', action='store_true', default=False, help='GAT with sparse version or not.')
parser.add_argument('--seed', type=int, default=72, help='Random seed.')
parser.add_argument('--epochs', type=int, default=100, help='Number of epochs to train.')
parser.add_argument('--lr', type=float, default=0.005, help='Initial learning rate.')
parser.add_argument('--weight_decay', type=float, default=5e-4, help='Weight decay (L2 loss on parameters).')
parser.add_argument('--hidden', type=int, default=8, help='Number of hidden units.')
parser.add_argument('--nb_heads', type=int, default=8, help='Number of head attentions.')
parser.add_argument('--dropout', type=float, default=0.6, help='Dropout rate (1 - keep probability).')
parser.add_argument('--alpha', type=float, default=0.2, help='Alpha for the leaky_relu.')
parser.add_argument('--patience', type=int, default=100, help='Patience')
args = parser.parse_args(['--no-cuda'])
args.cuda = not args.no_cuda and torch.cuda.is_available()
random.seed(args.seed)
np.random.seed(args.seed)
torch.manual_seed(args.seed)
if args.cuda:
torch.cuda.manual_seed(args.seed)
def compute_test():
model.eval()
output = model(features, adj)
loss_test = F.nll_loss(output[idx_test], labels[idx_test])
acc_test = accuracy(output[idx_test], labels[idx_test])
print("Test set results:",
"loss= {:.4f}".format(loss_test.data.item()),
"accuracy= {:.4f}".format(acc_test.data.item()))
adj, features, labels, idx_train, idx_val, idx_test = load_data()
print("ok")
model = GAT(nfeat=features.shape[1],
nhid=8,
nclass=int(labels.max()) + 1,
dropout=0,
nheads=8,
alpha=0.2)
model.load_state_dict(torch.load('gantmwindows.pth'))
compute_test()
Hi, thank you for the great work!
I'm curious about way you initialized the model parameters. In the official TensorFlow implementation, if I understand correctly, the authors used the default parameter initialization of tf.layers.conv1d
, which according to the source code uses glorot_uniform_initializer
with a default gain=1.0
as in the source code.
In your implementation, you used glorot_uniform_initializer
with gain=1.414
for GraphAttentionLayer
as in L21 and glorot_normal_initializer
with gain=1.414
for SpGraphAttentionLayer
as in L90. Is there a particular reason for doing so? Thank you.
Hi, Diego! Thank you for sharing the code.
I have read and run the code in branch similar_impl_tensorflow. There are some questions I confuse about.
The paper of GAT tells that the author replaces concatenation with averaging of heads in the second (predict) layer in multi-head. But I only find
x = self.out_att(x, adj) return F.log_softmax(x, dim=1)
in models.py. It happens in not only similar_impl_tensorflow branch but also master branch. Could you tell me if pyGAT contains this part? Thanks.
Besides, since the paper's output is h', which represents new features of entities. But the output of pyGAT's second layer is classification result. I believe both outputs of the first layer and the second layer are new features h', and the biggest difference between them is that the dimension of the latter one is equal to the dimension of nClass, am I right?
Thanks again,
Jason
Dear authors,
Thank you for sharing you code. When I run the source code, I encountered the following problem.
Traceback (most recent call last):
File "train.py", line 45, in
model = GAT(nfeat=features.shape[1], nhid=args.hidden, nclass=labels.max() + 1, dropout=args.dropout, nheads=args.nb_heads, alpha=args.alpha)
File "/home/wanyao/www/Dropbox/ghproj/pyGAT/models.py", line 16, in init
self.out_att = GraphAttentionLayer(nhid * nheads, nclass, dropout=dropout, alpha=alpha, concat=False)
File "/home/wanyao/www/Dropbox/ghproj/pyGAT/layers.py", line 22, in init
self.a = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(2*out_features, 1).type(torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor), gain=np.sqrt(2.0)), requires_grad=True)
TypeError: new() received an invalid combination of arguments - got (Variable, int), but expected one of:
This is caused by that, in line 45 of train.py, labels should be int type, not torch.LongTensor.
model = GAT(nfeat=features.shape[1], nhid=args.hidden, nclass=labels.max() + 1, dropout=args.dropout, nheads=args.nb_heads, alpha=args.alpha)
After I modified "labels.max()" to be "int(labels.max())", the problem is solved.
@Diego999 i am getting the following error
all_combinations_matrix = torch.cat([Wh_repeated_in_chunks, Wh_repeated_alternating], dim=1)
RuntimeError: CUDA out of memory. Tried to allocate 896.00 MiB (GPU 0; 6.00 GiB total capacity; 4.12 GiB already allocated; 162.50 MiB free; 4.48 GiB reserved in total by PyTorch)
any insights on these
THanks in advance
Hello,
In your implementation of SpGAT,
there is this line:
edge_e = torch.exp(-self.leakyrelu(self.a.mm(edge_h).squeeze()))
However, I cannot understand why you added the minus sign in front of the leak relu operation.
Is that right?
hi,when I run the code in GPU,it is out of memory .
can you see this problems?
Hi!When we use t-SNE to draw, for the output features of the model, do we use the predicted label of the corresponding model or the true label of the sample?
I feel confused about inductive learning on PPI.
How to utilize the trained model to get the node embedding on the unseen graph?
Thanks a lot in advance for your help!
if we set self.dropout=True
, the probability of an element to be zeroed is one. then all elements of output tenor is zero.
I get confused about it
class GAT(nn.Module):
def __init__(self, nfeat, nhid, nclass, dropout, alpha, nheads):
"""Dense version of GAT."""
super(GAT, self).__init__()
self.dropout = dropout
self.attentions = [GraphAttentionLayer(nfeat, nhid, dropout=dropout, alpha=alpha, concat=True) for _ in range(nheads)]
for i, attention in enumerate(self.attentions):
self.add_module('attention_{}'.format(i), attention)
self.out_att = GraphAttentionLayer(nhid * nheads, nclass, dropout=dropout, alpha=alpha, concat=False)
def forward(self, x, adj):
x = F.dropout(x, self.dropout, training=self.training)
x = torch.cat([att(x, adj) for att in self.attentions], dim=1)
x = F.dropout(x, self.dropout, training=self.training) # if dropout is true,x is zero
x = F.elu(self.out_att(x, adj))
return F.log_softmax(x, dim=1)
Hello,
I use my own datasets, and sparseGAT. Now I have met an Assert Error in layers:
Traceback (most recent call last): File "F:/googledownload/pyGAT-master/train_data.py", line 157, in <module> loss_values.append(train(epoch)) File "F:/googledownload/pyGAT-master/train_data.py", line 108, in train output=model(x_train[i], adj_index) File "D:\anaconda3.4\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "F:\googledownload\pyGAT-master\models_data.py", line 49, in forward x = torch.cat([att(x, adj) for att in self.attentions], dim=1) File "F:\googledownload\pyGAT-master\models_data.py", line 49, in <listcomp> x = torch.cat([att(x, adj) for att in self.attentions], dim=1) File "D:\anaconda3.4\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "F:\googledownload\pyGAT-master\layers_data.py", line 164, in forward assert not torch.isnan(h_prime).any() AssertionError
Maybe there is nan value, But I don't know how to solve it.
Hi,
Thank you very much for sharing the code. I noticed that you use the original cora dataset, rather than the processed one in GCN and GAT. I was also thinking of using the original one, but I found that the processed data discard the paper id, so I need to find a way to build the correspondence of nodes.
In your code, you just use the first 140 node as training, etc. Is this split the same as the original GAT and GCN paper?
By the way, what are the differences between the master branch and similar_impl_tensorflow?
Thanks again,
Liyu
I have the following errors in running:
Wh_repeated_in_chunks = Wh.repeat_interleave(N, dim=0)
AttributeError:
'Tensor' object has no attribute 'repeat_interleave'
I have tried pytroch0.4.1 and 1.0, and this error has occurred. How can I solve this problem?
Good luck
Hi,
When I ran this code. The following error appears " attention = torch.where(adj > 0, e, zero_vec)
File "/home/sean/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 28, in wrapped
return f(*args, **kwargs)
RuntimeError: Could not run 'aten::gt.Scalar' with arguments from the 'SparseCUDATensorId' backend. 'aten::gt.Scalar' is only available for these backends: [CPUTensorId, CUDATensorId, QuantizedCPUTensorId, VariableTensorId].
" I did not know what happen? Could you please give some suggestions? Does my pytorch version is not correct? Thanks.
Hi,
Nice job and thanks for sharing! I am doing the graph classification problem (different graphs with the same #nodes and #features ). I'm wondering can I use GAT to do the graph classification? Is that means different graphs should share the same weights matrix W? And how can I modify the model for batch-wise training?
Hope for your suggestions!
Hi Dear Author,
This implementation is wonderful. And I hope to use the coefficient matrix to replace a normal adjacency for some downstream tasks. For the dense version I could clearly see where does this coefficient locate, but the sparse version seems in a higher level thus I am not sure how to extract such a matrix, or rebuild it. Could I get some hint from you, that would be my a appreciation.
Steps to reproduce:
Error:
Collecting mkl-fft==1.0.4 (from -r requirements.txt (line 3)) Could not find a version that satisfies the requirement mkl-fft==1.0.4 (from -r requirements.txt (line 3)) (from versions: ) No matching distribution found for mkl-fft==1.0.4 (from -r requirements.txt (line 3))
Out of the box, I get the following error during eval:
IndexError: invalid index of a 0-dim tensor. Use `tensor.item()` in Python or `tensor.item<T>()` in C++ to convert a 0-dim tensor to a number
Following an unrelated thread here, I'm able to fix it by changing the following code (lines 110-112):
print("Test set results:",
"loss= {:.4f}".format(loss_test.data[0]),
"accuracy= {:.4f}".format(acc_test.data[0]))
to
print("Test set results:",
"loss= {:.4f}".format(loss_test.data),
"accuracy= {:.4f}".format(acc_test.data))
Just wanted to give you a heads-up. I checked and didn't see anything, but sorry if this was already reported. Below is my conda env in case that's relevant to anyone:
(pygat) -bash-4.2$ conda list
# packages in environment at /home/user/envs/pygat:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
blas 1.0 mkl
ca-certificates 2020.6.24 0
certifi 2020.6.20 py38_0
cudatoolkit 10.1.243 h6bb024c_0
freetype 2.10.2 h5ab3b9f_0
intel-openmp 2020.1 217
jpeg 9b h024ee3a_2
ld_impl_linux-64 2.33.1 h53a641e_7
libedit 3.1.20191231 h7b6447c_0
libffi 3.3 he6710b0_1
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 h2733197_1
lz4-c 1.9.2 he6710b0_0
mkl 2020.1 217
mkl-service 2.3.0 py38he904b0f_0
mkl_fft 1.1.0 py38h23d657b_0
mkl_random 1.1.1 py38h0573a6f_0
ncurses 6.2 he6710b0_1
ninja 1.9.0 py38hfd86e86_0
numpy 1.18.5 py38ha1c710e_0
numpy-base 1.18.5 py38hde5b4d6_0
olefile 0.46 py_0
openssl 1.1.1g h7b6447c_0
pillow 7.1.2 py38hb39fc2d_0
pip 20.1.1 py38_1
python 3.8.3 hcff3b4d_0
pytorch 1.5.1 py3.8_cuda10.1.243_cudnn7.6.3_0 pytorch
readline 8.0 h7b6447c_0
scipy 1.5.0 py38h0b6359f_0
setuptools 47.3.1 py38_0
six 1.15.0 py_0
sqlite 3.32.3 h62c20be_0
tk 8.6.10 hbc83047_0
torchvision 0.6.1 py38_cu101 pytorch
wheel 0.34.2 py38_0
xz 5.2.5 h7b6447c_0
zlib 1.2.11 h7b6447c_3
zstd 1.4.4 h0b5b093_3
Hello, thanks for sharing your work. In your implementation, I noticed that you split the dataset as "train:range(140), val:range(200, 500), test:range(500, 1500)", I want to know why you split dataset like this?
Is this a good split for evaluating model performance?
can U explain the following code?
& what's the meaning of -9e15 and adj?
zero_vec = -9e15*torch.ones_like(e)
attention = torch.where(adj > 0, e, zero_vec)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.