mims-harvard / gnnguard Goto Github PK

View Code? Open in Web Editor NEW

57.0 57.0 15.0 12.26 MB

Defending graph neural networks against adversarial attacks (NeurIPS 2020)

Home Page: https://zitniklab.hms.harvard.edu/projects/GNNGuard

License: Other

Python 100.00%

adversarial-attacks deep-learning graph-convolutional-networks graph-neural-networks robust-learning

gnnguard's People

Contributors

Stargazers

Watchers

Forkers

milkigit xcgoner qunqun1213 trendingtechnology clusterdevs seohuibae shixiongjing xiangzhang1015 yimingzhang06 yzfxmu voladorlu madhusti-d rahulgupta9202 perseuslee manhbui1208

gnnguard's Issues

Question about edge pruning

Hi,

Thanks for the great work! I have a question for the code

GNNGuard/defense/gcn.py

Line 184 in 33f5390

sim[sim<0.1] = 0

, where it drops the edge based on the similarity score in Equation 3 of the paper. However, the paper only mentions that dropping the edge is based on the Equation 5, not Equation 3. Could you kindly explain why do we add this line in the code? Or something I'm missing?

Thanks!

A question about the pruning procedure

Good Paper! But I have a question. As described in paper, GNNGuard prunes graph edges according to Equation (5). But I do not find the any code to do this. Could you point out location of the pruning codes?

question about disease Pathway dataset(DP)

Thank you for the great work!! Can you please provide the code for the data-loader of DP dataset ?
Thanks!!

Questions regarding parameters for experiments in the paper

I have some questions regarding the parameters you used for the experiments in the paper:

In Appendix E of the paper, you mentioned that you set P_0 = 0.5. I assume this is the same P_0 as used in Eq. (5) for pruning the edges after importance estimation? While in your code, as you replied in another issue, P_0 seems to be 0.1. I am wondering which one is the value, 0.1 or 0.5, that you used in the experiments in your paper?
I saw in line 116 of your GCN implementation that the layer-wise graph memory seems to be disabled? Is that also the case for the experiments in your paper? In other words, what are the $$\beta$$ value you used for Eq. (7) for the experiments in your paper?

can you run mettack.py?

it seems type casting erros (of sparse,torch sparce & torch tensor) are everywhere in model implementation. simply run python Mettack.py and I got a runtime error.
There seems to be so many bugs in the code that I can't fix them by myself. Can you give a clearer code or tell me how to run Mettack.py without an error?

Support for GPU.

Had some issue to run with GPU. Should change the code as follows. (Only for GCN as base)

For the gcn_conv.py file to replace in Torch_geometric, add the following code block before class GCN(MessagePassing)

@torch.jit._overload
def gcn_norm(edge_index, edge_weight=None, num_nodes=None, improved=False,
             add_self_loops=True, dtype=None):
    # type: (Tensor, OptTensor, Optional[int], bool, bool, Optional[int]) -> PairTensor  # noqa
    pass


@torch.jit._overload
def gcn_norm(edge_index, edge_weight=None, num_nodes=None, improved=False,
             add_self_loops=True, dtype=None):
    # type: (SparseTensor, OptTensor, Optional[int], bool, bool, Optional[int]) -> SparseTensor  # noqa
    pass


def gcn_norm(edge_index, edge_weight=None, num_nodes=None, improved=False,
             add_self_loops=True, dtype=None):

    fill_value = 2. if improved else 1.

    if isinstance(edge_index, SparseTensor):
        adj_t = edge_index
        if not adj_t.has_value():
            adj_t = adj_t.fill_value(1., dtype=dtype)
        if add_self_loops:
            adj_t = fill_diag(adj_t, fill_value)
        deg = sum(adj_t, dim=1)
        deg_inv_sqrt = deg.pow_(-0.5)
        deg_inv_sqrt.masked_fill_(deg_inv_sqrt == float('inf'), 0.)
        adj_t = mul(adj_t, deg_inv_sqrt.view(-1, 1))
        adj_t = mul(adj_t, deg_inv_sqrt.view(1, -1))
        return adj_t

    else:
        num_nodes = maybe_num_nodes(edge_index, num_nodes)

        if edge_weight is None:
            edge_weight = torch.ones((edge_index.size(1), ), dtype=dtype,
                                     device=edge_index.device)

        if add_self_loops:
            edge_index, tmp_edge_weight = add_remaining_self_loops(
                edge_index, edge_weight, fill_value, num_nodes)
            assert tmp_edge_weight is not None
            edge_weight = tmp_edge_weight

        row, col = edge_index[0], edge_index[1]
        deg = scatter_add(edge_weight, col, dim=0, dim_size=num_nodes)
        deg_inv_sqrt = deg.pow_(-0.5)
        deg_inv_sqrt.masked_fill_(deg_inv_sqrt == float('inf'), 0)
        return edge_index, deg_inv_sqrt[row] * edge_weight * deg_inv_sqrt[col]

Add the following line into 'gcn_conv.py'

@staticmethod
def norm(edge_index, num_nodes, edge_weight=None, improved=False,
             dtype=None):
    if edge_weight is None:
        edge_weight = torch.ones((edge_index.size(1), ), dtype=dtype, device=edge_index.device)
    
    # Add this line
    edge_weight = edge_weight.to(edge_index.device)
    # Add this line
    
fill_value = 1 if not improved else 2

In defense/gcn.py, add the following lines

def forward(self, x, adj):
    """we don't change the edge_index, just update the edge_weight;
    some edge_weight are regarded as removed if it equals to zero"""
    x = x.to_dense()

    """GCN and GAT"""
    if self.attention:
        adj = self.att_coef(x, adj, i=0)
    # Add this line
    edge_index = adj._indices().to(self.device)

    x = self.gc1(x, edge_index, edge_weight=adj._values())
    x = F.relu(x)
    # x = self.bn1(x)
    if self.attention:  # if attention=True, use attention mechanism
        adj_2 = self.att_coef(x, adj, i=1)
        adj_memory = adj_2.to_dense()  # without memory
        # adj_memory = self.gate * adj.to_dense() + (1 - self.gate) * adj_2.to_dense()
        row, col = adj_memory.nonzero()[:,0], adj_memory.nonzero()[:,1]
        edge_index = torch.stack((row, col), dim=0)
        adj_values = adj_memory[row, col]
    else:
        edge_index = adj._indices()
        adj_values = adj._values()
    # Add this line
    edge_index = edge_index.to(self.device)
    adj_values = adj_values.to(self.device)

    x = F.dropout(x, self.dropout, training=self.training)
    x = self.gc2(x, edge_index, edge_weight=adj_values)

I haven't used other models as the base model to run on GPU. But hopefully the code above will help with those who are using GPU. Cheers!

A question on hyperparameters

Hi,

Thanks for your great work!

I have trouble reproducing your results with GIN on Cora under 20% Metattack. I used the pre-perturbed data provided by DeepRobust with Pro-GNN splits (data and splits). However, I only got 58.22±4.04 (10 reps), which is far from 72.2 in your paper. Even if the dataset split in Pro-GNN is different from yours, I don't think the results should be so different.

So I wonder if I made mistakes about the hyperparameters. I followed the hyperparameters in your paper and your code:

    epochs = 200,
    patience = 10,
    lr = 0.01,
    weight_decay = 5e-4,
    hidden = 16,
    dropout = 0.5,
    modelname = 'GIN',
    GNNGuard = True,
    seed = 15,

Could you take a look and see which values of hyperparameters should I use? Thanks.

P.S. To save memory, I changed cosine_similarity in defense/gin.py:

GNNGuard/defense/gin.py

Lines 157 to 159 in 33f5390

    
           sim_matrix = cosine_similarity(X=fea_copy, Y=fea_copy)  # try cosine similarity 
        
           # sim_matrix = torch.from_numpy(sim_matrix) 
        
           sim = sim_matrix[row, col]

to the following:

from sklearn.preprocessing import normalize

def paired_cosine_similarity(X, i, j):
    X_normed = normalize(X)
    return (X_normed[i] * X_normed[j]).sum(1)

sim = paired_cosine_similarity(fea_copy, row, col)

I think this code should be equivalent to yours.

Question upon running dataset ogbn-arxiv

Hi Xiang,

Thanks for your impressive work!
I have a question regarding to running ogbn-arxiv dataset. It seems like directly running this large dataset under the current framework within a single GPU is not possible. Could you provide any tips on efficiently conduct robust evaluation on ogbn-arxiv? Specifically, I have several questions such as:

Do you use any sampling techniques during training GNN models? If it is, what sampling method would you recommend?
Do you run ogbn-arxiv on a single GPU or CPU? If it can be done on a single GPU, how large memories should it has?
It seems like it would take weeks for directly employing Mettack and Nettack to finish attacking on such large network, if it is the case it would be very infeasible especially for the poisoning setting. I am not sure if this observation is correct.

Thanks in advance!

mims-harvard / gnnguard Goto Github PK

gnnguard's People

Contributors

Stargazers

Watchers

Forkers

gnnguard's Issues

Question about edge pruning

A question about the pruning procedure

question about disease Pathway dataset(DP)

Questions regarding parameters for experiments in the paper

can you run mettack.py?

Support for GPU.

A question on hyperparameters

Question upon running dataset ogbn-arxiv

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	sim_matrix = cosine_similarity(X=fea_copy, Y=fea_copy) # try cosine similarity
	# sim_matrix = torch.from_numpy(sim_matrix)
	sim = sim_matrix[row, col]