qitianwu / sgformer Goto Github PK

The official implementation for NeurIPS2023 paper "SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations"

Python 97.00% Shell 3.00%

sgformer's Introduction

SGFormer: Simplified Graph Transformers

The official implementation for NeurIPS23 paper "SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations".

Related material: [Paper], [Blog], [Video]

SGFormer is a graph encoder backbone that efficiently computes all-pair interactions with one-layer attentive propagation.

SGFormer is built upon our previous works on scalable graph Transformers with linear complexity NodeFormer (NeurIPS22, spotlight) and DIFFormer (ICLR23, spotlight).

What's news

[2023.10.28] We release the code for the model on large graph benchmarks. More detailed info will be updated soon.

[2023.12.20] We supplement more details for how to run the code.

[2024.05.05] We supplement the code for testing time and memory in ./medium/time_test.py

Model and Results

The model adopts a simple architecture and is comprised of a one-layer global attention and a shallow GNN.

The following tables present the results for standard node classification tasks on medium-sized and large-sized graphs.

Requirements

For datasets except ogbn-papers100M, we used the environment with package versions indicated in ./large/requirement.txt. For ogbn-papers100M, one needs PyG version >=2.0 for running the code.

Dataset

One can download the datasets (Planetoid, Deezer, Pokec, Actor/Film) from the google drive link below:

https://drive.google.com/drive/folders/1rr3kewCBUvIuVxA6MJ90wzQuF-NnCRtf?usp=drive_link

For Chameleon and Squirrel, we use the new splits that filter out the overlapped nodes.

For the OGB datasets, they will be downloaded automatically when running the code.

Run the codes

Please refer to the bash script run.sh in each folder for running the training and evaluation pipeline.

Citation

If you find our code and model useful, please cite our work. Thank you!

      @inproceedings{
        wu2023sgformer,
        title={SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations},
        author={Qitian Wu and Wentao Zhao and Chenxiao Yang and Hengrui Zhang and Fan Nie and Haitian Jiang and Yatao Bian and Junchi Yan},
        booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
        year={2023}
        }

sgformer's People

Contributors

Stargazers

Watchers

Forkers

acse-sl420 hs991023 zaku-zaku monsterdove awekling vamoko eltociear yinjinya yotuke kwongfei kdc202 kssamwang sailfish009 orasuka yxx6

sgformer's Issues

RuntimeError: shape '[10, 18, 7, 18, 7, 32]' is invalid for input of size 5242880

Thank you for your contribution to science, I am having the following problem reproducing your code
Traceback (most recent call last):
File "/tmp/pycharm_project_772/tools/train.py", line 195, in
main()
File "/tmp/pycharm_project_772/tools/train.py", line 184, in main
train_detector(
File "/tmp/pycharm_project_772/mmdet/apis/train.py", line 186, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], **kwargs)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 31, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 77, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/tmp/pycharm_project_772/mmdet/models/detectors/base.py", line 247, in train_step
losses = self(**data)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 146, in new_func
output = old_func(*new_args, **new_kwargs)
File "/tmp/pycharm_project_772/mmdet/models/detectors/base.py", line 181, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/tmp/pycharm_project_772/mmdet/models/detectors/two_stage.py", line 142, in forward_train
x = self.extract_feat(img)
File "/tmp/pycharm_project_772/mmdet/models/detectors/two_stage.py", line 82, in extract_feat
x = self.backbone(img)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/tmp/pycharm_project_772/mmdet/models/backbones/sgformer.py", line 484, in forward
x, mask = blk(x, H, W, mask)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in call_impl
return forward_call(*args, **kwargs)
File "/tmp/pycharm_project_772/mmdet/models/backbones/sgformer.py", line 263, in forward
x, mask = self.attn(self.norm1(x), H, W, mask)
File "/opt/anaconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/tmp/pycharm_project_772/mmdet/models/backbones/sgformer.py", line 150, in forward
q2, k2, v2 = window_partition(q2, q_window, H, W), window_partition(k2, window_size, H, W),
File "/tmp/pycharm_project_772/mmdet/models/backbones/sgformer.py", line 24, in window_partition
x = x.view(B, H // window_size, window_size, W // window_size, window_size, C)
RuntimeError: shape '[10, 18, 7, 18, 7, 32]' is invalid for input of size 5242880， I entered an image size of 512x512

About Proof for Theorem 2

Hi~ I'm not very good at maths, I want to ask about the proof of (18) in the appendix of the paper.

Reproducing results on proteins

Hi,

I run the following command to reproduce ogbn-proteins results

python main-batch.py --method sgformer  --dataset ogbn-proteins --metric rocauc --lr 0.01 --hidden_channels 64 \
    --gnn_num_layers 2  --gnn_dropout 0. --gnn_weight_decay 0. --gnn_use_residual --gnn_use_weight --gnn_use_bn --gnn_use_act \
    --trans_num_layers 1 --trans_dropout 0. --trans_weight_decay 0. --trans_use_residual --trans_use_weight --trans_use_bn \
    --use_graph --graph_weight 0.5 \
    --batch_size 10000 --seed 123 --runs 5 --epochs 1000 --eval_step 9 --device 1

This is the accuracy I am getting:

Chosen epoch: 65 Final Train: 81.17 Final Test: 72.85.

I haven't gotten all 5 runs because it seems very far away from the reported number.

Is the command provided in the run.sh right?

RuntimeError: CUDA error: an illegal memory access was encountered

Hi,

Thanks for open sourcing your implementation. I am trying to run your examples as pointed in the run.sh file, but it seems like it is throwing error.

Traceback (most recent call last):
  File "/localscratch/SGFormer/large/main-batch.py", line 143, in <module>
    out_i = model(x_i, edge_index_i)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/localscratch/SGFormer/large/ours.py", line 268, in forward
    x2 = self.graph_conv(x, edge_index)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/localscratch/SGFormer/large/ours.py", line 86, in forward
    x = conv(x, edge_index, layer_[0])
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/localscratch/SGFormer/large/ours.py", line 33, in forward
    adj = SparseTensor(row=col, col=row, value=value, sparse_sizes=(N, N))
  File "/usr/local/lib/python3.10/dist-packages/torch_sparse/tensor.py", line 26, in __init__
    self.storage = SparseStorage(
  File "/usr/local/lib/python3.10/dist-packages/torch_sparse/storage.py", line 69, in __init__
    assert trust_data or int(row.max()) < M
RuntimeError: CUDA error: an illegal memory access was encountered

This usually happens because of incorrect SparseTensor construction, i.e. it waits for rows/columns to be sorted.

The error occurred with this command

python main-batch.py --method sgformer  --dataset ogbn-proteins --metric rocauc --lr 0.01 --hidden_channels 64     --gnn_num_layers 2  --gnn_dropout 0. --gnn_weight_decay 0. --gnn_use_residual --gnn_use_weight --gnn_use_bn --gnn_use_act     --trans_num_layers 1 --trans_dropout 0. --trans_weight_decay 0. --trans_use_residual --trans_use_weight --trans_use_bn     --use_graph --graph_weight 0.5     --batch_size 10000 --seed 123 --runs 5 --epochs 1000 --eval_step 9 --device 1

Can you also clarify why the sparse adj needs to be reconstructed for every layer per forward pass?

    def forward(self, x, edge_index, x0):
        N = x.shape[0]
        row, col = edge_index
        d = degree(col, N).float()
        d_norm_in = (1. / d[col]).sqrt()
        d_norm_out = (1. / d[row]).sqrt()
        value = torch.ones_like(row) * d_norm_in * d_norm_out
        value = torch.nan_to_num(value, nan=0.0, posinf=0.0, neginf=0.0)
        adj = SparseTensor(row=col, col=row, value=value, sparse_sizes=(N, N))
        x = matmul(adj, x)  # [N, D]

        if self.use_init:
            x = torch.cat([x, x0], 1)
            x = self.W(x)
        elif self.use_weight:
            x = self.W(x)

        return x

Why can't you just cache the normalized adj matrix and reuse it?

For amazon2m, I am getting a similar error:

Traceback (most recent call last):
  File "/localscratch/SGFormer/large/main-batch.py", line 143, in <module>
    out_i = model(x_i, edge_index_i)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/localscratch/SGFormer/large/ours.py", line 268, in forward
    x2 = self.graph_conv(x, edge_index)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/localscratch/SGFormer/large/ours.py", line 86, in forward
    x = conv(x, edge_index, layer_[0])
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/localscratch/SGFormer/large/ours.py", line 37, in forward
    x = torch.cat([x, x0], 1)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Unable to read pokec.mat

Hi,

Thanks for your quick responses to my questions with last couple of days.
I have been trying to load the pokec.mat to train on pokec but I am having the following problem:

  File "/localscratch/specformer-dev/large/dataset.py", line 383, in load_pokec_mat
    fulldata = scipy.io.loadmat(f'{data_dir}/pokec/pokec.mat')
  File "/opt/conda/envs/sgformer/lib/python3.9/site-packages/scipy/io/matlab/_mio.py", line 226, in loadmat
    MR, _ = mat_reader_factory(f, **kwargs)
  File "/opt/conda/envs/sgformer/lib/python3.9/site-packages/scipy/io/matlab/_mio.py", line 74, in mat_reader_factory
    mjv, mnv = _get_matfile_version(byte_stream)
  File "/opt/conda/envs/sgformer/lib/python3.9/site-packages/scipy/io/matlab/_miobase.py", line 249, in _get_matfile_version
    raise ValueError('Unknown mat file type, version {}, {}'.format(*ret))
ValueError: Unknown mat file type, version 32, 99

Which scipy version are you using? I tried different ones but the error persists. Can you provide some guidance?

Thanks

Dataset request

可否提供数据集谷歌云盘链接

about the media datasets

Where can download the media datasets that can be used for this code directly? Looking forward to the datasets!

Why is it that only GCN achieves about 84.9% of the results, but only 84% when attention is added?

Running Papers100M

Can you provide some guidelines to run papers100m? Simply using pyg 2.0 with compatible packages does not seems to work for me.

Originally posted by @kaansancak in #13 (comment)

PyG version >= 2.x required for papers100m

It appears that the loader for papers100m is utilizing NeighborLoader (https://github.com/qitianwu/SGFormer/blob/main/100M/nb-sample.py#L16), which is accessible after PyG >= 2.x, as indicated in pyg-team/pytorch_geometric#3868.

The code fails to execute with the requirements.txt specified in /large.

I resolved this issue on my end by upgrading PyG. However, it would be appreciated if you could provide the versions you used for this experiment.

求一份ppt

谢谢

可以求一份 PPT 吗？

昨晚刷到了大神的三篇工作，NodeFormer，DIFFormer 和 SGFormer，感觉真的是一个非常好的工作，三篇工作能够圈起来成为一个圈，还是逐步递进的，B 站上看到大神的讲解视频，PPT 做的也是很棒，github 链接过期了，请问可以更新一份这三篇工作的 PPT 吗？😶‍🌫️

About time complexity

Thank you very much for your outstanding contribution in the field of graph transformer. I meet a question of SGFormer. Shouldn't the time complexity of equation(3) be $O(N*N)$ because of the product of $K^T and V$? Is there anything wrong with my understanding? I want to figure out it ! Thank U!

About the RAM usage on papers100M~

How much RAM is needed to properly run the model on the dataset papers100M? Our machine is 256GB, but it won't work!

Running out of memory on ogbn-arxiv

Hi,

I am currently trying to run the code in the script for ogbn-arxiv, but I am running out of memory. Right now, it attempts to do a matrix multiply between two tensors which are each 169k by 256, and it allocated 100GB of VRam to do so.

I am wondering what settings I need to change in the script to get it to run?

Specifically,

SGFormer/large/ours.py

Line 152 in 466ff76

    
           attention_num = torch.sigmoid(torch.einsum("nhm,lhm->nlh", qs, ks))  # [N, L, H]

is where the error occurs.

Am I supposed to use main-batch.py instead of main.py?

Thanks!

GN Block Impact on SGFormer Performance

Hello Qitian,

Firstly, I would like to express my appreciation for your inspiring work and the dedication evident in your code! My primary interest lies in understanding how the GN block influences the performance of SGFormer. To this end, I adjusted the graph_weight parameter within the range of [0, 0.5, 0.8, 1.0] and conducted experiments on medium-sized benchmarks. Below, I present both the reproduced results and those reported in your paper.

Observations and Questions:

Overall, I am quite satisfied as I achieved similar or nearly identical performance to what was reported in the paper for most datasets. However, there were some exceptions: notably lower results in the squirrel dataset and an unexpectedly higher score in chameleon. Could you please shed some light on why this might be the case?
Furthermore, when setting the graph_weight to 1, SGFormer essentially transformed into a simple GCN model. In this scenario, the performance of the GCN model significantly surpassed what was reported in your paper, especially in the datasets cora, citeseer, pubmed, and squirrel. This observation seems to diminish the relative improvements of SGFormer over the standard GCN model.

Reproducing Data & Environment:

I used datasets and splittings from the provided shared link.
The following are details of my environment setup:
- python=3.8.18=h955ad1f_0
- pytorch=2.1.1=py3.8_cuda11.8_cudnn8.7.0_0
- pytorch-cluster=1.6.3=py38_torch_2.1.0_cu118
- pytorch-scatter=2.1.2=py38_torch_2.1.0_cu118
- pytorch-sparse=0.6.18=py38_torch_2.1.0_cu118

Your insights or suggestions regarding these observations would be highly valuable.