chandlerbang / pro-gnn Goto Github PK

Implementation of the KDD 2020 paper "Graph Structure Learning for Robust Graph Neural Networks"

Home Page: https://arxiv.org/abs/2005.10203

Python 80.44% Shell 19.56%

adversarial-attacks attack-defense deep-learning defense graph-mining graph-neural-networks graph-structure-recovery machine-learning pytorch semi-supervised-learning

pro-gnn's People

Contributors

Stargazers

Watchers

Forkers

qianrenjian yzfxmu jjwangnlp xrosliang milkigit zeta1999 shuangte hermine2015 andyjzhao tonyyhlau shengguanwsu mldl haiminzhang jingmouren a935154916 amarpaul-git kaiyuk huahuachang w55699 janfernandes ben4562002 tzislam zzysh12345 zaixizhang zhjwy9343 xiuyu-li longmeix leiqh549 linhduongtuan berry-wen zf-zhang simba-pumba harishgovardhandamodar bbyiringiro shybutlucy leader120 jasmeetsinghiitd techthiyanes sailfish009 fxf864 zhuyulin-tony qq-jiang qinguangming1999

pro-gnn's Issues

About the attacked graph

Thanks for sharing the code. I'd like to ask you about the details of the attacked graph.
I find that the perturbed_adj generated by mettack or nettack methods is symmetric.
But when I get the perturbed_adj by conducting adversarial attack (meta and nettack) from deeprobust library, I find that it is asymmetric.
What caused the difference in data? Did you preprocess the perturbed_adj?
Looking forward to your reply.

cora dataset node classification performance under mettack

Hi,

Thanks for sharing your paper's code.

I tried to reproduce the cora dataset test results for classification accuracy under mettack as shown in Table 2 of your paper. The performance is much lower. (e.g. accuracy 0.8033 for ptb_rate 0.05, 0.7425 for ptb_rate 0.1, 0,6831 for ptb_rate 0.15)
I used the default setting and ran the command as python train.py --dataset cora --attack meta --ptb_rate 0.1 --epoch 1000 (the ptb_rate changed as required).
So can you please tell me what setting do you use or how can I reproduce a similar result?
The "Run the code " in your README file is broken. python train.py --dataset polblogs --attack meta --ptb_rate 0.15 --epoch 1000 output error AssertionError: ProGNN splits only cora, citeseer, pubmed, cora_ml

How to compute the rank of adjacency maxtrix

Hi, I have a question about the computation of the rank of adjacency. I see the paper says the rank of perturbated adjacency matrix will become larger. However, when I use torch.matrix_rank() to compute the rank, I find they are the same.

So do they use another function or preprocess the adjacency matrix? Thanks!

about the details of the metattack experiment

Hi~thanks for sharing the code! I would like to ask a question related to the metattack experiment.
In subsection 5.3 you investigate the weights of normal and adversarial edges in the learned adjacency matrix S. How can we obtain which edges are normal and which are adversarial in the adjacency matrix? I don't find a key for adversarial edges in the properties of meta datasets, or am I missing it?
I'm doing experiments with the meta datasets, but I'm troubled by this question, I would appreciate it if you could reply！

About reconstructed graph data

hi, i have a doubt about the train.py that is the reconstructed graph data stored in where? and if it is possible that i want to use the reconstructed data to train other target mode to verify the denfense effect?

About the hyperparameter

Really wonderful work, thank you for your contribution! Recently I'm trying to reproduce the results in your paper using deeprobust, but it seems that the default hyperparameters could not directly get the best result. Could you provide the elaborate hyperparameters for running? I will be grateful for your help!

About Accuracy rate

Hello author, I want to do some comparative tests with the model provided by Deeprobust, but I find that the results are quite different from those in the paper. For example, when I attack 0.25 data set with meta, the results of Prognn are only 51%, and the results of GCN are only 43%. GCNJaccard is only 55%, basically the other data sets are also very different, I don't know if it's me or the parameter setting

A question about generating nettacked data

Hi,

Really nice work! I've got a question about how to generate the nettacked data (in the nettack folder of this repo). In this folder, only one perturbed adj is provided for each graph under each ptb_rate. It seems that each perturbed adj is attacked for all target nodes.

However, the argument target_node of Nettack.attack in DeepRobust is only a single node. I have no idea how to pass all target nodes to Nettack simultaneously. I can't find related details in the paper, either. So I'm a bit confused how the nettacked data were generated.

In fact I'm trying to apply nettack on other graphs but can't figure out how to. So it would be very helpful if you could provide some details about how you generated these nettacked data.

Thanks!

about the details of the nettack experiment

Hi! Thanks for sharing the code, i'd like to ask you about the details of the nettack experiment.
I noticed in the paper that you only selected 10% of the target nodes as the test set when you conducted the nettack experiment on the pubmed dataset. So when i get pubmed dataset from deeprobust repository and set the parameter 'ptb_rate'=1.0, there will be 186 targeted nodes, i just need to sample 10% nodes i.e. 18 nodes as my test set, am i right?

GPU OOM on Pubmed

Hi,

It seems that we have to use a dense "perturbed_adj" for "prognn.fit()"? I'm currently facing an out-of-memory issue on GPU when running the script on Pubmed dataset (both Cora and Citeseer work to me). I guess the dense adjacency matrix for Pubmed would consume lots of GPU memory and thus cause OOM (My GPU has around 10G memory)? Running the script on CPU would be too slow. Any solutions to solve this? Thanks in advance.

The code requires so much memory

Hi,
When it runs on google colab with Tesla T4, I got CUDA Out of Memory error message (Pubmed dataset). Tried to allocate 1.45 GiB (GPU 0; 14.73 GiB total capacity; 13.08 GiB already allocated; 785.88 MiB free; 13.08 GiB reserved in total by PyTorch).
I hope you kindly answer my following answers.

Q1: Does the program need such memory? Is there any way to solve this problem?
Q2: On the pubmed dataset, I cannot reach the accuracy rate in the paper by GCN. Can you share the parameters or some suggestions?

Thanks.

About the usage of validation set.

Hi,

Really nice work! When I run the code, I find that a validation set is used when training the prognnn model. Is the validation set considered when conducting metattack? If we don't use the validation set to pick the best model, whether will the accuracy drop?

By the way, I came into trouble when I try to reproduce the result of vanilla GCN using code in this repo. For example, I can only achieve a ~50% acc with 0.2 metattack on cora dataset. Could you please give me some tips about the hyperparams used to produce the result reported in the paper (~59%).

Thanks a lot!

关于实验准确度的问题

您好，我在跑pubmed数据集在meta攻击下的性能实验，在发现10%的扰动下GCN的准确度只有65%左右，20%扰动下准确度只有50%左右，请问论文中的80以上的准确率是怎么跑出来的，方便分享一下超参数吗？

about GCN-Jaccard results

we reproduce the GCN-Jaccard utilizing the code provided in deeprobust library. However, we find the results are much larger than the one reported in your paper. For example, we get test accuracy of 0.80+ on citeseer for ptb_rate=5.0 and threshold=0.1. We want to know whether there are some mistakes in GCN-jaccard codes. Thanks!

About pubmed dataset

Hi，Thank you so much for sharing the code，However, when I run large data sets such as PubMed, there will be a problem of insufficient CUDA memory,How did you solve it？

Problem of experiments results on Polblogs dataset

Hi, I find a problem with polblogs dataset. I cannot reproduce the experiment results fully under the same random seed. I test them on the metattack model under the perturbation rate 10%, but I find I cannot reproduce all the results consistently.

When I set random seed as 10,
GCN: 0.8680981595092024
RGCN: 0.8629856850715747
ProGNN: 0.82719836400818
ProGNN-fs: 0.8384458077709612

ProGNN, ProGNN-fs are consistent with your paper.

When I set random seed 15,
GCN: 0.7198364008179959
RGCN: 0.7157464212678937
ProGNN:0.7147239263803682
ProGNN-fs: 0.7157464212678937

GCN, RGCN are consistent with your paper.

The parameter setting of ProGNN on polblog dataset, and all code is based on DeepRobust
args.epochs = 1200
args.gamma = 1
args.alpha = 5e-4
args.beta = 1.5
args.lambda_ = 0
args.lr = 5e-4

If I am not wrong, I suppose you run experiments on different random seeds. Could you help me check it when you available?

Thanks in advance!

Questions about Netattack

Hi~ Thanks for sharing this great work! I have one question about the experimental details while conducting nettack attack.

It looks that nettack[1] in deeprobust can only perturb the graph strucuture according to the given one targeted node at once. But I noticed that for each dataset and each perturbation-rate there is only one adv. adjacency matrix in the folder "nettack" for many targeted nodes, e.g., "cora_nettack_adj_2.0.npz" for the attacked_test_nodes in "cora_nettacked_nodes.json".

I am wondering how you did this to save disk-memory (cause I also want to perturb other datasets not included in your experiments) or Is there any my mis-understandings on the targeted-attack? I would really appreciate it if you can spare your valueable time to answer it.

[1] https://deeprobust.readthedocs.io/en/latest/source/deeprobust.graph.targeted_attack.html

The code is running extremely slow

Hi,

I've made sure that the GPU boosting is on, however, it takes about 150 minutes to train Pro-GNN on Tesla T4 (Yelp dataset with 3.9k nodes). In comparison, the vanilla GCN can be trained on the same dataset and device within 9 seconds (300 epochs).
I'm willing to follow your work and hope you kindly answer my following answers. : )

Q1: Is there any method (or anything that I've done wrong) to speed up the training process?
Q2: What are the most important parameters to grid search for Pro-GNN, if I focus on improving the classification performance on the original (not the perturbed/attacked) graph? Since it will take months to grid search all the parameters.

the probleam of nettack

大佬，打扰一下，我是需要验证个人的一些算法与防御算法的结合对于图上攻击的效果，由于时间问题，只使用了提供好meta、nettack结构，meta效果还挺正常，但是nettack好像没有什么效果，而且1.0 -5.0是表示扰动程度吗？我跑了一下，好像没有什么攻击效果啊，是我使用的问题吗，如果有时间，望您回一下，打扰了

Code for training using GAT

Thanks for your wonderful work, I am wondering if the code for training using GATs could be provided?

Regards

About the dataset

hello, i want to test the model's performence on other dataset,but the dataset in this project uses the data like cora.npz,which is not the same as other models use ind.cora.x/ind.cora.y... as dataset, now i want to change the dataset,can you give some advises？

Questions about node selection in Nettack

Hi! Thanks for sharing the code. I have a question about the selection of target nodes in Nettack.
I notice that in test_nettack.py, about 40 nodes are chosen as the target nodes following the original paper.
In Pro-GNN, nodes in the test set whose degrees are larger than 10 are chosen.
Did I misunderstand the experiment settings?

codes between this one and deeprobust don't match well

HI, I've downloaded this code and the deeprobust code,they don't match well,I'd like to know which version of source code of deeprobust matches this code.
I'll appreciate if you could help me!

About the GCN module

I have a question about the code line in DeepRobuts.graph.defence.gcn.py:

output = torch.spmm(adj, support)

I track the code and found that in the train_gcn part of prognn.py , the adj is dense matrix after normalization and yet you use torch.spmm to calculate adj and support ? but the adj is dense matrix.