Coder Social home page Coder Social logo

vijaydwivedi75 / lrgb Goto Github PK

View Code? Open in Web Editor NEW
149.0 3.0 18.0 28.25 MB

Long Range Graph Benchmark, NeurIPS 2022 Track on D&B

License: MIT License

Python 35.43% Jupyter Notebook 60.48% Shell 4.09%
graph-benchmark long-range-dependence graph-neural-networks graph-transformer long-range graph-datasets graph-representation-learning

lrgb's Introduction

LRGB: Long Range Graph Benchmark

arXiv

We present the Long Range Graph Benchmark (LRGB) with 5 graph learning datasets that arguably require long-range reasoning to achieve strong performance in a given task.

  • PascalVOC-SP
  • COCO-SP
  • PCQM-Contact
  • Peptides-func
  • Peptides-struct

In this repo, we provide the source code to load the proposed datasets and run baseline experiments. The repo is based on GraphGPS which is built using PyG and GraphGym from PyG2.

Update: Reassessment of LRGB

For a reassessment of the baselines on which LRGB were initially evaluated, we refer to this paper and thank @toenshoff for the PR on PCQM-Contact's evaluation metric.

Overview of Datasets

Dataset Domain Task Node Feat. (dim) Edge Feat. (dim) Perf. Metric
PascalVOC-SP Computer Vision Node Prediction Pixel + Coord (14) Edge Weight (1 or 2) macro F1
COCO-SP Computer Vision Node Prediction Pixel + Coord (14) Edge Weight (1 or 2) macro F1
PCQM-Contact Quantum Chemistry Link Prediction Atom Encoder (9) Bond Encoder (3) Hits@K, MRR
Peptides-func Chemistry Graph Classification Atom Encoder (9) Bond Encoder (3) AP
Peptides-struct Chemistry Graph Regression Atom Encoder (9) Bond Encoder (3) MAE

Statistics of Datasets

Dataset # Graphs # Nodes μ Nodes μ Deg. # Edges μ Edges μ Short. Path μ Diameter
PascalVOC-SP 11,355 5,443,545 479.40 5.65 30,777,444 2,710.48 10.74±0.51 27.62±2.13
COCO-SP 123,286 58,793,216 476.88 5.65 332,091,902 2,693.67 10.66±0.55 27.39±2.14
PCQM-Contact 529,434 15,955,687 30.14 2.03 32,341,644 61.09 4.63±0.63 9.86±1.79
Peptides-func 15,535 2,344,859 150.94 2.04 4,773,974 307.30 20.89±9.79 56.99±28.72
Peptides-struct 15,535 2,344,859 150.94 2.04 4,773,974 307.30 20.89±9.79 56.99±28.72

Python environment setup with Conda

conda create -n lrgb python=3.9
conda activate lrgb

conda install pytorch=1.9 torchvision torchaudio -c pytorch -c nvidia
conda install pyg=2.0.2 -c pyg -c conda-forge
conda install pandas scikit-learn

# RDKit is required for OGB-LSC PCQM4Mv2 and datasets derived from it.  
conda install openbabel fsspec rdkit -c conda-forge

# Check https://www.dgl.ai/pages/start.html to install DGL based on your CUDA requirements
pip install dgl-cu111 dglgo -f https://data.dgl.ai/wheels/repo.html

pip install performer-pytorch
pip install torchmetrics==0.7.2
pip install ogb
pip install wandb

conda clean --all

Running GraphGPS

conda activate lrgb

# Running GCN baseline for Peptides-func.
python main.py --cfg configs/GCN/peptides-func-GCN.yaml  wandb.use False

# Running SAN baseline for PascalVOC-SP.
python main.py --cfg configs/SAN/vocsuperpixels-SAN.yaml  wandb.use False

The scripts for all experiments are located in run directory.

W&B logging

To use W&B logging, set wandb.use True and have a gtransformers entity set-up in your W&B account (or change it to whatever else you like by setting wandb.entity).

License Information

Dataset Derived from Original License LRGB Release License
PascalVOC-SP Pascal VOC 2011 Custom* Custom*
COCO-SP MS COCO CC BY 4.0 CC BY 4.0
PCQM-Contact PCQM4Mv2 CC BY 4.0 CC BY 4.0
Peptides-func SATPdb CC BY-NC 4.0 CC BY-NC 4.0
Peptides-struct SATPdb CC BY-NC 4.0 CC BY-NC 4.0

*Custom License for Pascal VOC 2011 (respecting Flickr terms of use)

Leaderboards

The leaderboards of various models' performance on the datasets in LRGB are at paperswithcode.

Currently reported results (last update on Aug 10th, 2023)

PascalVOC-SP (Node Classification)
Model Test F1 (higher is better) Reference #params
Exphormer 0.3975±0.0037 Shirzad, Velingker, Venkatachalam, et al, ICML 2023 509k
GraphGPS 0.3748±0.0109 Rampášek et al, NeurIPS 2022 510k
Cache-GNN+LapPE 0.3462±0.0085 Ma et al, KDD 2023 500k
DRew-GatedGCN+LapPE 0.3314±0.0024 Gutteridge et al, ICML 2023 502k
SAN+LapPE 0.3230±0.0039 Dwivedi et al, NeurIPS 2022 531k
SAN+RWSE 0.3216±0.0027 Dwivedi et al, NeurIPS 2022 468k
GatedGCN+LapPE+virtual node 0.3103±0.0068 Cai et al, ICML 2023 502k
GatedGCN 0.2873±0.0219 Dwivedi et al, NeurIPS 2022 502k
GatedGCN+LapPE 0.2860±0.0085 Dwivedi et al, NeurIPS 2022 502k
Transformer+LapPE 0.2694±0.0098 Dwivedi et al, NeurIPS 2022 501k
GCNII 0.1698±0.0080 Dwivedi et al, NeurIPS 2022 492k
GCN 0.1268±0.0060 Dwivedi et al, NeurIPS 2022 496k
GINE 0.1265±0.0076 Dwivedi et al, NeurIPS 2022 505k
COCO-SP (Node Classification)
Model Test F1 (higher is better) Reference #params
Exphormer 0.3455±0.0009 Shirzad, Velingker, Venkatachalam, et al, ICML 2023 499k
GraphGPS 0.3412±0.0044 Rampášek et al, NeurIPS 2022 516k
Cache-GNN+LapPE 0.2793±0.0033 Ma et al, KDD 2023 500k
GatedGCN 0.2641±0.0045 Dwivedi et al, NeurIPS 2022 509k
Transformer+LapPE 0.2618±0.0031 Dwivedi et al, NeurIPS 2022 508k
SAN+LapPE 0.2592±0.0158 Dwivedi et al, NeurIPS 2022 536k
GatedGCN+LapPE 0.2574±0.0034 Dwivedi et al, NeurIPS 2022 509k
SAN+RWSE 0.2434±0.0156 Dwivedi et al, NeurIPS 2022 474k
GCNII 0.1404±0.0011 Dwivedi et al, NeurIPS 2022 505k
GINE 0.1339±0.0044 Dwivedi et al, NeurIPS 2022 515k
GCN 0.0841±0.0010 Dwivedi et al, NeurIPS 2022 509k
Peptides-func (Graph Classification)
Model Test AP (higher is better) Reference #params
DRew-GCN+LapPE 0.7150±0.0044 Gutteridge et al, ICML 2023 502k
GRIT 0.6988±0.0082 Ma, Lin, et al, ICML 2023 443k
GraphMLP-Mixer 0.6970±0.0080 He et al, ICML 2023 397k
Graph ViT 0.6942±0.0075 He et al, ICML 2023 692k
MGT+WavePE 0.6817±0.0064 Ngo, Hy, et al, 2023 499k
PathNN 0.6816±0.0026 Michel, Nikolentzos et al, ICML 2023 510k
GatedGCN+RWSE+virtual node 0.6685±0.0062 Cai et al, ICML 2023 506k
Cache-GNN+LapPE 0.6671±0.0056 Ma et al, KDD 2023 500k
Graph Diffuser 0.6651±0.0010 Glickman & Yahav, 2023 509k
CIN++ 0.6569±0.0117 Giusti et al, 2023 ~500k
GraphGPS 0.6535±0.0041 Rampášek et al, NeurIPS 2022 504k
Exphormer 0.6527±0.0043 Shirzad, Velingker, Venkatachalam, et al, ICML 2023 446k
SAN+RWSE 0.6439±0.0075 Dwivedi et al, NeurIPS 2022 500k
SAN+LapPE 0.6384±0.0121 Dwivedi et al, NeurIPS 2022 493k
Transformer+LapPE 0.6326±0.0126 Dwivedi et al, NeurIPS 2022 488k
GatedGCN+RWSE 0.6069±0.0035 Dwivedi et al, NeurIPS 2022 506k
GCN 0.5930±0.0023 Dwivedi et al, NeurIPS 2022 508k
GatedGCN 0.5864±0.0077 Dwivedi et al, NeurIPS 2022 509k
GCNII 0.5543±0.0078 Dwivedi et al, NeurIPS 2022 505k
GINE 0.5498±0.0079 Dwivedi et al, NeurIPS 2022 476k
Peptides-struct (Graph Regression)
Model Test MAE (lower is better) Reference #params
Cache-GNN+LapPE 0.2358±0.0013 Ma et al, KDD 2023 500k
Graph ViT 0.2449±0.0016 He et al, ICML 2023 561k
MGT+WavePE 0.2453±0.0025 Ngo, Hy, et al, 2023 499k
GRIT 0.2460±0.0012 Ma, Lin, et al, ICML 2023 439k
Graph Diffuser 0.2461±0.0010 Glickman & Yahav, 2023 509k
Exphormer 0.2481±0.0007 Shirzad, Velingker, Venkatachalam, et al, ICML 2023 426k
GCN+virtual node 0.2488±0.0021 Cai et al, ICML 2023 508k
Graph MLP-Mixer 0.2494±0.0007 He et al, ICML 2023 397k
GraphGPS 0.2500±0.0005 Rampášek et al, NeurIPS 2022 504k
CIN++ 0.2523±0.0013 Giusti et al, 2023 ~500k
Transformer+LapPE 0.2529±0.0016 Dwivedi et al, NeurIPS 2022 488k
DRew-GCN+LapPE 0.2536±0.0015 Gutteridge et al, ICML 2023 495k
SAN+RWSE 0.2545±0.0012 Dwivedi et al, NeurIPS 2022 500k
PathNN 0.2545±0.0032 Michel, Nikolentzos et al, ICML 2023 469k
NPQ+GATv2 0.2589±0.0031 Jain et al, KLR Workshop at ICML, 2023 NA
SAN+LapPE 0.2683±0.0043 Dwivedi et al, NeurIPS 2022 493k
GatedGCN+RWSE 0.3357±0.0006 Dwivedi et al, NeurIPS 2022 506k
GatedGCN 0.3420±0.0013 Dwivedi et al, NeurIPS 2022 509k
GCNII 0.3471±0.0010 Dwivedi et al, NeurIPS 2022 505k
GCN 0.3496±0.0013 Dwivedi et al, NeurIPS 2022 508k
GINE 0.3547±0.0045 Dwivedi et al, NeurIPS 2022 476k
PCQM-Contact (Link Prediction)
Model Test MRR (higher is better) Test Hits@1 Test Hits@3 Test Hits@10 Reference #params
Exphormer 0.3637±0.0020 Shirzad, Velingker, Venkatachalam, et al, ICML 2023 396k
Cache-GNN+RWSE 0.3488±0.0008 0.1463±0.0011 0.4102±0.0008 0.8693±0.0008 Ma et al, KDD 2023 500k
DRew-GCN 0.3444±0.0017 Gutteridge et al, ICML 2023 515k
Graph Diffuser 0.3388±0.0011 0.1369±0.0012 0.4053±0.0011 0.8592±0.0007 Glickman & Yahav, 2023 521k
SAN+LapPE 0.3350±0.0003 0.1355±0.0017 0.4004±0.0021 0.8478±0.0044 Dwivedi et al, NeurIPS 2022 499k
SAN+RWSE 0.3341±0.0006 0.1312±0.0016 0.4030±0.0008 0.8550±0.0024 Dwivedi et al, NeurIPS 2022 509k
GraphGPS 0.3337±0.0006 Rampášek et al, NeurIPS 2022 513k
GatedGCN+RWSE 0.3242±0.0008 0.1288±0.0013 0.3808±0.0006 0.8517±0.0005 Dwivedi et al, NeurIPS 2022 524k
GCN 0.3234±0.0006 0.1321±0.0007 0.3791±0.0004 0.8256±0.0006 Dwivedi et al, NeurIPS 2022 504k
GatedGCN 0.3218±0.0011 0.1279±0.0018 0.3783±0.0004 0.8433±0.0011 Dwivedi et al, NeurIPS 2022 527k
GINE 0.3180±0.0027 0.1337±0.0013 0.3642±0.0043 0.8147±0.0062 Dwivedi et al, NeurIPS 2022 517k
Transformer+LapPE 0.3174±0.0020 0.1221±0.0011 0.3679±0.0033 0.8517±0.0039 Dwivedi et al, NeurIPS 2022 502k
GCNII 0.3161±0.0004 0.1325±0.0009 0.3607±0.0003 0.8116±0.0009 Dwivedi et al, NeurIPS 2022 501k

Citation

If you find this work useful, please cite our paper:

@inproceedings{dwivedi2022LRGB,
  title={Long Range Graph Benchmark}, 
  author={Dwivedi, Vijay Prakash and Rampášek, Ladislav and Galkin, Mikhail and Parviz, Ali and Wolf, Guy and Luu, Anh Tuan and Beaini, Dominique},
  booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2022},
  url={https://openreview.net/forum?id=in7XC5RcjEn}
}

lrgb's People

Contributors

migalkin avatar rampasek avatar toenshoff avatar vijaydwivedi75 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

lrgb's Issues

Questions about PascalVOC-SP and COCO-SP

I found that the feature vector of these two datasets is 14-dimension, but when I read your paper , the description in this paper is " The initial feature of each superpixel node is 12 dimensional RGB feature value (mean, std, max, min) ".
So I don't particularly understand why the features are actually fourteen-dimensional. Could you please explain to me? Thanks for your kindness.

PASCAL VOC Data Preparation Error

Hello,

I am getting the following assertion error when I run the jupyter notebook to generate PASCAL VOC superpixel dataset. Can you please advise on how to resolve this error? I just ran the code without any modifications.

assert n_sp_extracted == np.max(superpixels) + 1, ('superpixel indices', np.unique(superpixels))

Best regards,
Hani

Bug in main.py for OptimConfig in new versions of PyG

my package details are :
PyG-2.5.2
torch - 2.1.0
CUDA Version: 12.2
Python : 3.9.19

In the main.py file, I just imported the OptimConfig and SchedulerConfig from torch_geometric.graphgym.optim.
it gives a error :
Traceback (most recent call last):
File "/home/iplab/garv_iit-bhu/GraphTransformer/lrgb/main.py", line 178, in
optimizer = create_optimizer(model.parameters(),
File "/home/iplab/anaconda3/envs/lrgb/lib/python3.9/site-packages/torch_geometric/graphgym/optim.py", line 38, in create_optimizer
return from_config(func)(params, cfg=cfg)
File "/home/iplab/anaconda3/envs/lrgb/lib/python3.9/site-packages/torch_geometric/graphgym/config.py", line 578, in wrapper
raise ValueError(f"'cfg.{arg_name}' undefined")
ValueError: 'cfg.optimizer_config' undefined

Please help me out. Also let me know if u need any more details.

question about Peptides-struct

Hello,
Thank you for the nice code and timely dataset. I wonder as the label is of dimension 11, do you jointly predict all 11 features simultaneously, or do you predict each feature individually and then average MAE? Thank you!

Link broken for PCQM-Contact

Hi.
I found that the link for downloading PCQM-Contact seems to be broken,
which is
self.url = 'https://datasets-public-research.s3.us-east-2.amazonaws.com/PCQM4M/pcqm4m-contact.tsv.gz'
at line 294 in pcqm4mv2_contact.py. Could you please check that? Thanks.

I also open the same issue in GraphGPS repo, where you could double check this issue.
Thanks.

Ordering of pascal data in torch_geometric.data.lrgb.datasets

I am trying to work with the Pascal superpixel dataset. To avoid reprocessing the data, I used the Dataset class provided in torch_geometric ( https://pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/datasets/lrgb.html ).

I am trying to overlay the graphs on top of their corresponding images in the original dataset, which I downloaded from http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz . However, I cannot figure out which images in this dataset correspond to which graphs in the other dataset.

For example, the first image in the Pascal validation dataset has dimensions (333, 500), but the maximum (x, y) coordinates in the graph of the first image from the torch_geometric Dataset are (492, 367). Do these correspond to different images, or am I misunderstanding how the coordinates are calculated?

Apologies if this is the wrong place to ask this! I appreciate any help you can offer.

Adding LRGB to the HuggingFace hub

Hi!
@migalkin suggested on Twitter adding your datasets to the HuggingFace hub, which I think is a super cool idea, so I'm opening this issue to see if you need any help with that!

Here is the step by step tutorial on how to do so.
Ping me if you need anything in the process 🤗

Support for PyG later than 2.0.2

Hello,

Any plans for supporting later releases of PyG (e.g. 2.1)? It's a fairly new benchmark and it's not compatible with the newest PyG releases out of the box.

Best regards,
Hani

ModuleNotFoundError: No module named 'lrgb'

Hi,

So I wanted to import this repo as a module. I installed it through conda as pip install git+https://github.com/vijaydwivedi75/lrgb.git.
I've done the same with the GraphGPS repo.

When I run this simple piece of code:
import graphgps
import lrgb

I get ModuleNotFoundError: No module named 'lrgb'.

Notice that the graphgps module imports without problems which leads me to believe the problem is in the lrgb repo and for the life of me I cannot figure it out.

I'd be very happy to get any suggestion. Thanks!

Identifier in peptides dataset

Hello, I found that there are identifier attributes in the peptide dataset, but I don't seem to find them described in the paper, could you please tell me what they are?
Best regards.

Publish datasets in DGL format?

Hi LRGB authors,

Nice work! Curious if there is a plan to publish those datasets in the format that can be loaded by DGL? Or provide APIs similar to OGB for DGL users?

-Minjie

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.