diego999 / pygat Goto Github PK

Pytorch implementation of the Graph Attention Network model by Veličković et. al (2017, https://arxiv.org/abs/1710.10903)

License: MIT License

Python 100.00%

graph-attention-networks attention-mechanism self-attention pytorch neural-networks python

pygat's Issues

Is it working on batched data?

I really appreciate about sharing your code, Diego999.

It is so nice to understand and to use.

But when I tested GAT layer on batched data, the error occur because of dimension.

Did you test this code on batched data?

I just wonder if it is happening only to me.

--------def forward(self, input, adj):
---------->h = torch.mm(input, self.W)
-----------N = h.size()[0]

RuntimeError: matrices expected, got 3D, 2D tensors at /pytorch/aten/src/TH/generic/THTensorMath.c:2028

code question

In 56 line of layesr.py, using repeat_interleave() function .But this function was introduced in pytorch 1.1.And if you use pytorch1.1, you should modify the compute_test() function which in 105 line of train.py. Replace data[0] with item()

Only two layer are implemented

I see you only implemented two layer's GAT, the accuracy of your model will lower than the official implementation, right? I want to do my experiment based on your code, and can you give me some advice on the relationships between the number of layers and the final accuracy or F1 score?

Could you please give me a guide on finding other datasets in your paper

Hi, Diego

Your code is very useful and easy to use. I want to reproduce its performance on the other datasets such as Citeseer. Could you mind tell me how to download and process those datasets?

Thanks,
Hao

I am confused about the calculation order for the Sparse version

hi, I read the code about the class SpGraphAttentionLayer, as following:

In the original paper, I think operation2 is before the operation1.
Can you tell me why you adjust the order?

Im having trouble understanding these lines

pyGAT/layers.py

Lines 31 to 32 in 808e0ff

    
           a_input = torch.cat([h.repeat(1, N).view(N * N, -1), h.repeat(N, 1)], dim=1).view(N, -1, 2 * self.out_features) 
        
           e = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(2))

It seems that these 2 lines are implementing the argument of LeakyReLU in Equation (3) from the paper page 3.

But I don't know how it works. Can you explain it?

Also, I'm getting memory issues on these two lines when trying with other data.

Thanks,
@derek-saal

RuntimeError: CUDA out of memory.

@Diego999
hello, when i use this code to run my datasets, my data has 10242 nodes.
the error occurs:
RuntimeError: CUDA out of memory. Tried to allocate 6.25 GiB (GPU 0; 31.75 GiB total capacity; 25.01 GiB already allocated; 4.51 GiB free; 1.18 GiB cached)

do you know how to solve it?
Someone says reduce the batch-size, But i haven't found any batch-size.

output the label

hello, when I use the following code to output the label:
pred= model(features, adj) output=pred.cuda().data.cpu().numpy() label_dict = {0:"0",1:"1",2:"2",3:"3",4:"4",5:"5",6:"6"} # 定义标签颜色字典 with open("./embeddings.txt", "w") as fe, open("./labels.txt", 'w') as fl: for i in range(len(output)): fl.write(label_dict[int(list(output[i]).index(1.))]+"\n")

the error message occurs:
fl.write(label_dict[int(list(output[i]).index(1.))]+"\n")
ValueError: 1.0 is not in list

the output:
output: [[-1.847805 -1.6629431 -2.0786197 ... -1.9947618 -2.0082192 -2.061518 ]
[-1.9058554 -1.8536501 -1.9879545 ... -1.942745 -1.9345684 -1.9945679]
[-1.9354665 -1.8630134 -2.0195217 ... -1.8244724 -1.9596565 -2.0232475]
...
[-1.9140366 -1.8670493 -2.0020766 ... -1.9355989 -1.9930208 -2.0063093]
[-1.936709 -1.8693893 -1.9146458 ... -1.9425946 -1.9571905 -2.0101976]
[-1.8989682 -1.7643857 -2.0006032 ... -1.9852941 -1.9770277 -2.0285738]] (2708, 7) float32 <class 'numpy.ndarray'>

I just want to ask, each node has positive and negative probabilities for 7 categories, so which number is the category with the highest probability?

Using GAT for regression

Hi,

I would like to try this model out in a regression scenario, what sould be changed in order to do so? Is using a different loss function (one for regression) enough?

Thanks.

Confusion about SpecialSpmmFunction

In official documents, there is a notice 'Each function object is meant to be used only once (in the forward pass).' in subclass of torch.autograd.Function.
In model SpGraphAttentionLayer, you have use the object of SpecialSpmmFunction(self.special_spmm) twice, one for e_rowsum and one for h_prime.
Is it the right usage for subclass of torch.autograd.Function?

visualization question

hello, Diego
I would like to ask, what method did you use to visualize the result in the end?
Do you mind sharing the source code of the visualization?

Looking forward to your reply

About the operation of coo_matrix

Hi,

I found there is an operation to build the symmetric adjacency matrix as adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)

I have no ideas about what adj.T>adj means. Would you mind having some words on it?

Thanks!

normalize_adj is confusing

As explained in https://en.wikipedia.org/wiki/Laplacian_matrix, Symmetric normalized Laplacian is I-D^(-1/2)AD^(-1/2). However, the function normalize_adj() in utils.py seems to only perform D^(-1/2)AD^(-1/2).

Out of memory after serveral epochs

AssertionError in assert not torch.isnan(h_prime).any()

How to output labels updated after this algorithm?

hello
Because my ability is limited, so I will ask you again. How to output labels updated after this algorithm? I only saw lbl_in(1,2708,7) and lab_resh(2708,7)

Hi Diego,

If the batch size is added, what is the process like? Can you give me an example? I have been producing the program for several days but I still haven't produced it

Citeseer data set accuracy

Hello!I used the GAT network to run the citeseer database, but the accuracy could not reach 72.5, only 70.3. How did you set the parameters to run so high?

How to conduct the inductive learning setting?

Hi,

Thanks for sharing your codes in Pytorch for us.

I note that in your paper, you also conduct the inductive learning setting.
While the released codes do not support the inductive setting.

Would you like to release the codes that support the inductive setting? Or give us some advice?

Thanks

Best Regards.

Xu.

Node Classification with Multiple Graphs

Hi, I am trying to do an implementation which requires to input multiple graphs for node classification task. All the examples I've seen so far was for graph classification for this case. Although I've seen building block diagonal adjacency matrix, I'm not sure if it is for graph classification or node. Also I didn't understand whether should I create a block diagonal matrix with feature matrix and labels too or not.

Let's suppose I have 20 different graphs(with different number of nodes, edges, features). And each node of every graph is labeled.
All the nodes in first 10 graphs are for training, all the nodes of the next 5 graphs are for val, and all the nodes of last 5 graphs are for test. What I'm trying to do is predicting labels of the nodes for the graphs in test-set. How can I input multiple graphs into GAT(or any other GNN) with these conditions for node classification task(not for graph classification).? If the solution is block diagonal adj. matrix, should I do the same for labels and feature matrix too?

Is adj normalization essential?

Hi Diego,

Thank you so much for sharing your code! But I am confused with adj matrix.
In layers.py "attention = torch.where(adj > 0, e, zero_vec)", it seems that adj(normalized) just used as an adjacency matrix which shows whether there is an edge between two nodes. If it is true, why it should be normalized? From my understanding, adj(normalized)>0 and adj(unnormalized)>0 are the same.

Hope for your reply.
Thx!

The difference between your code and paper

Hi,
thank you for your code. I have noticed that you use the concatenation, while the paper uses the average method in the multi-head attention mechanism because the authors think concatenation is no longer sensible. What's the difference between the two methods?

How can my data set be implemented in this code？

hello，thanks for your work！Now I have some problems.It's that i have 2000 training samples,each sample consists of 8 different feature vectors representing eight different objects.So, i want to construct a fully linked graph consists of 8 nodes. And my goal is to classify nodes.My question is that how should i implement my thought in this code?OR my thought is wrong?
I hope you can give me some advice. Thank you.

Masking of adjacency matrix

Hi,

It looks like the code is processing a single adjacency matrix that includes information about connections between graph nodes in training, development and testing sets. I was wondering where in the code the adjacency matrix is masked such that it is not using the parts that reflect the training set during testing?

Many thanks,
Elena

The similar_impl_tensorflow branch doesn't run well

Loading cora dataset...
Traceback (most recent call last):
  File "train.py", line 105, in <module>
    loss_values.append(train(epoch))
  File "train.py", line 79, in train
    'loss_train: {:.4f}'.format(loss_train.data[0]),
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

I am curious about the 1 minute running time version, so I want to get this branch working.

What' the difference between tensorflow version and Pytorch version?

Save architecture but their results number has such a huge gap.
Any suggestions?

How to run pyGAT in parallel?

Thanks Diego for sharing the code.

I have a question about run the code in parallel, since the multi-head attention mechanism should be able run in parallel efficiently.

But I can't figure out how to make it?

Can anyone help, many thanks.

The test accuracy is pretty low (53.5%) for citeseer dataset

I switched the dataset to citeseer, and trained for 100 epochs, the 99th epoch training log is:
Epoch: 0099 loss_train: 0.9922 acc_train: 0.7000 loss_val: 1.0769 acc_val: 0.7100 time: 38.2587s

However, the finial test result is:
Test set results: loss= 1.2956 accuracy= 0.5350

The accuracy is pretty low, not like the 72% shown in the paper

Batch Size bigger than 1

How can we use the codes for other datasets with batch sizes bigger than 1?

model load successfully but not working

i write a simple test.py which is same as the code at the end of train.py. it loads *.pkl successfully, but doesn't work well. the accuracy is only 0.4530(or even worser, such as 0.02).
here is code.

from __future__ import division
from __future__ import print_function

import os
import glob
import time
import random
import argparse
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable

from utils import load_data, accuracy
from models import GAT, GATest, SpGAT

parser = argparse.ArgumentParser()
parser.add_argument('--no-cuda', action='store_true', default=False, help='Disables CUDA training.')
parser.add_argument('--fastmode', action='store_true', default=False, help='Validate during training pass.')
parser.add_argument('--sparse', action='store_true', default=False, help='GAT with sparse version or not.')
parser.add_argument('--seed', type=int, default=72, help='Random seed.')
parser.add_argument('--epochs', type=int, default=100, help='Number of epochs to train.')
parser.add_argument('--lr', type=float, default=0.005, help='Initial learning rate.')
parser.add_argument('--weight_decay', type=float, default=5e-4, help='Weight decay (L2 loss on parameters).')
parser.add_argument('--hidden', type=int, default=8, help='Number of hidden units.')
parser.add_argument('--nb_heads', type=int, default=8, help='Number of head attentions.')
parser.add_argument('--dropout', type=float, default=0.6, help='Dropout rate (1 - keep probability).')
parser.add_argument('--alpha', type=float, default=0.2, help='Alpha for the leaky_relu.')
parser.add_argument('--patience', type=int, default=100, help='Patience')

args = parser.parse_args(['--no-cuda'])
args.cuda = not args.no_cuda and torch.cuda.is_available()

random.seed(args.seed)
np.random.seed(args.seed)
torch.manual_seed(args.seed)
if args.cuda:
    torch.cuda.manual_seed(args.seed)

def compute_test():
    model.eval()
    output = model(features, adj)
    loss_test = F.nll_loss(output[idx_test], labels[idx_test])
    acc_test = accuracy(output[idx_test], labels[idx_test])
    print("Test set results:",
          "loss= {:.4f}".format(loss_test.data.item()),
          "accuracy= {:.4f}".format(acc_test.data.item()))

adj, features, labels, idx_train, idx_val, idx_test = load_data()
print("ok")
model = GAT(nfeat=features.shape[1],
            nhid=8,
            nclass=int(labels.max()) + 1,
            dropout=0,
            nheads=8,
            alpha=0.2)
model.load_state_dict(torch.load('gantmwindows.pth'))
compute_test()

Any particular reason for the parameter initialization you used?

Hi, thank you for the great work!

I'm curious about way you initialized the model parameters. In the official TensorFlow implementation, if I understand correctly, the authors used the default parameter initialization of tf.layers.conv1d, which according to the source code uses glorot_uniform_initializer with a default gain=1.0 as in the source code.

In your implementation, you used glorot_uniform_initializer with gain=1.414 for GraphAttentionLayer as in L21 and glorot_normal_initializer with gain=1.414 for SpGraphAttentionLayer as in L90. Is there a particular reason for doing so? Thank you.

Some questions about the layers

Hi, Diego! Thank you for sharing the code.

I have read and run the code in branch similar_impl_tensorflow. There are some questions I confuse about.

The paper of GAT tells that the author replaces concatenation with averaging of heads in the second (predict) layer in multi-head. But I only find
x = self.out_att(x, adj) return F.log_softmax(x, dim=1)
in models.py. It happens in not only similar_impl_tensorflow branch but also master branch. Could you tell me if pyGAT contains this part? Thanks.

Besides, since the paper's output is h', which represents new features of entities. But the output of pyGAT's second layer is classification result. I believe both outputs of the first layer and the second layer are new features h', and the biggest difference between them is that the dimension of the latter one is equal to the dimension of nClass, am I right?

Thanks again,

Jason

labels should be int in train.py, line 45

Dear authors,

Thank you for sharing you code. When I run the source code, I encountered the following problem.

Traceback (most recent call last):
File "train.py", line 45, in
model = GAT(nfeat=features.shape[1], nhid=args.hidden, nclass=labels.max() + 1, dropout=args.dropout, nheads=args.nb_heads, alpha=args.alpha)
File "/home/wanyao/www/Dropbox/ghproj/pyGAT/models.py", line 16, in init
self.out_att = GraphAttentionLayer(nhid * nheads, nclass, dropout=dropout, alpha=alpha, concat=False)
File "/home/wanyao/www/Dropbox/ghproj/pyGAT/layers.py", line 22, in init
self.a = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(2*out_features, 1).type(torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor), gain=np.sqrt(2.0)), requires_grad=True)
TypeError: new() received an invalid combination of arguments - got (Variable, int), but expected one of:

(int device)
(tuple of ints size, int device)
didn't match because some of the arguments have invalid types: (Variable, int)
(torch.Storage storage)
(Tensor other)
(object data, int device)
didn't match because some of the arguments have invalid types: (Variable, int)

This is caused by that, in line 45 of train.py, labels should be int type, not torch.LongTensor.
model = GAT(nfeat=features.shape[1], nhid=args.hidden, nclass=labels.max() + 1, dropout=args.dropout, nheads=args.nb_heads, alpha=args.alpha)

After I modified "labels.max()" to be "int(labels.max())", the problem is solved.

Memory error !!!

@Diego999 i am getting the following error
all_combinations_matrix = torch.cat([Wh_repeated_in_chunks, Wh_repeated_alternating], dim=1)
RuntimeError: CUDA out of memory. Tried to allocate 896.00 MiB (GPU 0; 6.00 GiB total capacity; 4.12 GiB already allocated; 162.50 MiB free; 4.48 GiB reserved in total by PyTorch)
any insights on these

THanks in advance

Minus sign in front of leaky relu

Hello,
In your implementation of SpGAT,

there is this line:
edge_e = torch.exp(-self.leakyrelu(self.a.mm(edge_h).squeeze()))

However, I cannot understand why you added the minus sign in front of the leak relu operation.

Is that right?

out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

hi，when I run the code in GPU,it is out of memory .
can you see this problems?

t-SNE

Hi！When we use t-SNE to draw, for the output features of the model, do we use the predicted label of the corresponding model or the true label of the sample？

How to perform inductive learning on PPI?

I feel confused about inductive learning on PPI.
How to utilize the trained model to get the node embedding on the unseen graph?
Thanks a lot in advance for your help!

Dropout

if we set self.dropout=True, the probability of an element to be zeroed is one. then all elements of output tenor is zero.
I get confused about it

class GAT(nn.Module):
    def __init__(self, nfeat, nhid, nclass, dropout, alpha, nheads):
        """Dense version of GAT."""
        super(GAT, self).__init__()
        self.dropout = dropout

        self.attentions = [GraphAttentionLayer(nfeat, nhid, dropout=dropout, alpha=alpha, concat=True) for _ in range(nheads)]
        for i, attention in enumerate(self.attentions):
            self.add_module('attention_{}'.format(i), attention)

        self.out_att = GraphAttentionLayer(nhid * nheads, nclass, dropout=dropout, alpha=alpha, concat=False)

    def forward(self, x, adj):
        x = F.dropout(x, self.dropout, training=self.training)
        x = torch.cat([att(x, adj) for att in self.attentions], dim=1)
        x = F.dropout(x, self.dropout, training=self.training) # if dropout is true,x is zero
        x = F.elu(self.out_att(x, adj))
        return F.log_softmax(x, dim=1)

Assert Error

Hello,
I use my own datasets, and sparseGAT. Now I have met an Assert Error in layers:
Traceback (most recent call last): File "F:/googledownload/pyGAT-master/train_data.py", line 157, in <module> loss_values.append(train(epoch)) File "F:/googledownload/pyGAT-master/train_data.py", line 108, in train output=model(x_train[i], adj_index) File "D:\anaconda3.4\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "F:\googledownload\pyGAT-master\models_data.py", line 49, in forward x = torch.cat([att(x, adj) for att in self.attentions], dim=1) File "F:\googledownload\pyGAT-master\models_data.py", line 49, in <listcomp> x = torch.cat([att(x, adj) for att in self.attentions], dim=1) File "D:\anaconda3.4\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "F:\googledownload\pyGAT-master\layers_data.py", line 164, in forward assert not torch.isnan(h_prime).any() AssertionError

Maybe there is nan value, But I don't know how to solve it.

train-validation-test data split

Hi,

Thank you very much for sharing the code. I noticed that you use the original cora dataset, rather than the processed one in GCN and GAT. I was also thinking of using the original one, but I found that the processed data discard the paper id, so I need to find a way to build the correspondence of nodes.

In your code, you just use the first 140 node as training, etc. Is this split the same as the original GAT and GCN paper?

By the way, what are the differences between the master branch and similar_impl_tensorflow?

Thanks again,

Liyu

code question

I have the following errors in running：
Wh_repeated_in_chunks = Wh.repeat_interleave(N, dim=0)
AttributeError: 'Tensor' object has no attribute 'repeat_interleave'
I have tried pytroch0.4.1 and 1.0, and this error has occurred. How can I solve this problem?
Good luck

Running Error.

Hi,

When I ran this code. The following error appears " attention = torch.where(adj > 0, e, zero_vec)
File "/home/sean/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 28, in wrapped
return f(*args, **kwargs)
RuntimeError: Could not run 'aten::gt.Scalar' with arguments from the 'SparseCUDATensorId' backend. 'aten::gt.Scalar' is only available for these backends: [CPUTensorId, CUDATensorId, QuantizedCPUTensorId, VariableTensorId].
" I did not know what happen? Could you please give some suggestions? Does my pytorch version is not correct? Thanks.

AssertionError in assert not torch.isnan(h_prime).any()

I train a GAT network. After 2 epoch I got the AssertionError as follow:

I wonder why I get NaN in this two assertion.

python version: 3.6
pytorch version: 0.4
os: mac os

Anyone can help? Thanks!

Batchwise Training

Hi,

Nice job and thanks for sharing! I am doing the graph classification problem (different graphs with the same #nodes and #features ). I'm wondering can I use GAT to do the graph classification? Is that means different graphs should share the same weights matrix W? And how can I modify the model for batch-wise training?

Hope for your suggestions!

Extract coefficient matrix in sparse version

Hi Dear Author,

This implementation is wonderful. And I hope to use the coefficient matrix to replace a normal adjacency for some downstream tasks. For the dense version I could clearly see where does this coefficient locate, but the sparse version seems in a higher level thus I am not sure how to extract such a matrix, or rebuild it. Could I get some hint from you, that would be my a appreciation.

Installation version issues: mkl-fft==1.0.4

Steps to reproduce:

Clone repo; cd to repo
Create env
1. conda create -n pyGAT python=3.5
2. conda activate pyGAT
pip install -r requirements.txt

Error:
Collecting mkl-fft==1.0.4 (from -r requirements.txt (line 3)) Could not find a version that satisfies the requirement mkl-fft==1.0.4 (from -r requirements.txt (line 3)) (from versions: ) No matching distribution found for mkl-fft==1.0.4 (from -r requirements.txt (line 3))

IndexError

Out of the box, I get the following error during eval:
IndexError: invalid index of a 0-dim tensor. Use `tensor.item()` in Python or `tensor.item<T>()` in C++ to convert a 0-dim tensor to a number

Following an unrelated thread here, I'm able to fix it by changing the following code (lines 110-112):

    print("Test set results:",
          "loss= {:.4f}".format(loss_test.data[0]),
          "accuracy= {:.4f}".format(acc_test.data[0]))

    print("Test set results:",
          "loss= {:.4f}".format(loss_test.data),
          "accuracy= {:.4f}".format(acc_test.data))

Just wanted to give you a heads-up. I checked and didn't see anything, but sorry if this was already reported. Below is my conda env in case that's relevant to anyone:

(pygat) -bash-4.2$ conda list
# packages in environment at /home/user/envs/pygat:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
blas                      1.0                         mkl
ca-certificates           2020.6.24                     0
certifi                   2020.6.20                py38_0
cudatoolkit               10.1.243             h6bb024c_0
freetype                  2.10.2               h5ab3b9f_0
intel-openmp              2020.1                      217
jpeg                      9b                   h024ee3a_2
ld_impl_linux-64          2.33.1               h53a641e_7
libedit                   3.1.20191231         h7b6447c_0
libffi                    3.3                  he6710b0_1
libgcc-ng                 9.1.0                hdf63c60_0
libgfortran-ng            7.3.0                hdf63c60_0
libpng                    1.6.37               hbc83047_0
libstdcxx-ng              9.1.0                hdf63c60_0
libtiff                   4.1.0                h2733197_1
lz4-c                     1.9.2                he6710b0_0
mkl                       2020.1                      217
mkl-service               2.3.0            py38he904b0f_0
mkl_fft                   1.1.0            py38h23d657b_0
mkl_random                1.1.1            py38h0573a6f_0
ncurses                   6.2                  he6710b0_1
ninja                     1.9.0            py38hfd86e86_0
numpy                     1.18.5           py38ha1c710e_0
numpy-base                1.18.5           py38hde5b4d6_0
olefile                   0.46                       py_0
openssl                   1.1.1g               h7b6447c_0
pillow                    7.1.2            py38hb39fc2d_0
pip                       20.1.1                   py38_1
python                    3.8.3                hcff3b4d_0
pytorch                   1.5.1           py3.8_cuda10.1.243_cudnn7.6.3_0    pytorch
readline                  8.0                  h7b6447c_0
scipy                     1.5.0            py38h0b6359f_0
setuptools                47.3.1                   py38_0
six                       1.15.0                     py_0
sqlite                    3.32.3               h62c20be_0
tk                        8.6.10               hbc83047_0
torchvision               0.6.1                py38_cu101    pytorch
wheel                     0.34.2                   py38_0
xz                        5.2.5                h7b6447c_0
zlib                      1.2.11               h7b6447c_3
zstd                      1.4.4                h0b5b093_3

data split issue

Hello, thanks for sharing your work. In your implementation, I noticed that you split the dataset as "train:range(140), val:range(200, 500), test:range(500, 1500)", I want to know why you split dataset like this?
Is this a good split for evaluating model performance?

code question

 can U explain the following code?

& what's the meaning of -9e15 and adj?
zero_vec = -9e15*torch.ones_like(e)
attention = torch.where(adj > 0, e, zero_vec)

	a_input = torch.cat([h.repeat(1, N).view(N * N, -1), h.repeat(N, 1)], dim=1).view(N, -1, 2 * self.out_features)
	e = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(2))

diego999 / pygat Goto Github PK

pygat's Issues

Recommend Projects

Recommend Topics

Recommend Org