snap-research / graphless-neural-networks Goto Github PK

[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)

License: MIT License

Python 93.09% Shell 6.91%

deep-learning distillation efficient-inference graph-algorithm graph-neural-networks knowledge-distillation pytorch gnn scalability

graphless-neural-networks's Introduction

Graph-less Neural Networks (GLNN)

Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation by Shichang Zhang, Yozen Liu, Yizhou Sun, and Neil Shah.

Overview

Distillation framework

Accuracy vs. inference time on the `ogbn-products` dataset

Getting Started

Setup Environment

We use conda for environment setup. You can use

bash ./prepare_env.sh

which will create a conda environment named glnn and install relevant requirements (from requirements.txt). For simplicity, we use CPU-based torch and dgl versions in this guide, as specified in requirements. To run experiments with CUDA, please install torch and dgl with proper CUDA support, remove them from requirements.txt, and properly set the --device argument in the scripts. See https://pytorch.org/ and https://www.dgl.ai/pages/start.html for more installation details.

Be sure to activate the environment with

conda activate glnn

before running experiments as described below.

Preparing datasets

To run experiments for dataset used in the paper, please download from the following links and put them under data/ (see below for instructions on organizing the datasets).

CPF data (cora, citeseer, pubmed, a-computer, and a-photo): Download the '.npz' files from here. Rename amazon_electronics_computers.npz and amazon_electronics_photo.npz to a-computer.npz and a-photo.npz respectively.
OGB data (ogbn-arxiv and ogbn-products): Datasets will be automatically downloaded when running the load_data function in dataloader.py. More details here.
BGNN data (house_class and vk_class): Follow the instructions here and download dataset pre-processed in DGL format from here.
NonHom data (penn94 and pokec): Follow the instructions here to download the penn94 dataset and its splits. The pokec dataset will be automatically downloaded when running the load_data function in dataloader.py.
Your favourite datasets: download and add to the load_data function in dataloader.py.

Usage

To quickly train a teacher model you can run train_teacher.py by specifying the experiment setting, i.e. transductive (tran) or inductive (ind), teacher model, e.g. GCN, and dataset, e.g. cora, as per the example below.

python train_teacher.py --exp_setting tran --teacher GCN --dataset cora

To quickly train a student model with a pretrained teacher you can run train_student.py by specifying the experiment setting, teacher model, student model, and dataset like the example below. Make sure you train the teacher using the train_teacher.py first and have its result stored in the correct path specified by --out_t_path.

python train_student.py --exp_setting ind --teacher SAGE --student MLP --dataset citeseer --out_t_path outputs

For more examples, and to reproduce results in the paper, please refer to scripts in experiments/ as below.

bash experiments/sage_cpf.sh

To extend GLNN to your own model, you may do one of the following.

Add your favourite model architectures to the Model class in model.py. Then follow the examples above.
Train teacher model and store its output (log-probabilities). Then train the student by train_student.py with the correct --out_t_path.

Results

GraphSAGE vs. MLP vs. GLNN under the production setting described in the paper (transductive and inductive combined). Delta_MLP (Delta_GNN) represents difference between the GLNN and the MLP (GNN). Results show classification accuracy (higher is better); Delta_GNN > 0 indicates GLNN outperforms GNN. We observe that GLNNs always improve from MLPs by large margins and achieve competitive results as GNN on 6/7 datasets. Please see Table 3 in the paper for more details.

Datasets	GNN(SAGE)	MLP	GLNN	Delta_MLP	Delta_GNN
Cora	79.29	58.98	78.28	19.30 (32.72%)	-1.01 (-1.28%)
Citseer	68.38	59.81	69.27	9.46 (15.82%)	0.89 (1.30%)
Pubmed	74.88	66.80	74.71	7.91 (11.83%)	-0.17 (-0.22%)
A-computer	82.14	67.38	82.29	14.90 (22.12%)	0.15 (0.19%)
A-photo	91.08	79.25	92.38	13.13 (16.57%)	1.30 (1.42%)
Arxiv	70.73	55.30	65.09	9.79 (17.70%)	-5.64 (-7.97%)
Products	76.60	63.72	75.77	12.05 (18.91%)	-0.83 (-1.09%)

Citation

If you find our work useful, please cite the following:

@inproceedings{zhang2021graphless,
      title={Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation}, 
      author={Shichang Zhang and Yozen Liu and Yizhou Sun and Neil Shah},
      booktitle={International Conference on Learning Representations}
      year={2022},
      url={https://arxiv.org/abs/2110.08727}
}

Contact Us

Please open an issue or contact [email protected] if you have any questions.

graphless-neural-networks's People

Contributors

Stargazers

Watchers

Forkers

cwhyee shichangzh cupcee hoytwen ucla-dm mohsenfazaeli jonnyhuanglzu flashbackkk fredjdean yozenliu paperwave lxer19 zhengyy0109 zhwl2117 jayadratha zza234s yanjiangjerry webster-781 syqinx

graphless-neural-networks's Issues

The problem of inference time

@ShichangZh Hello, I would like to ask how the inference time in the paper is calculated, I did not find the relevant code part, thank you!

Failed to build environment

the packages in requirement.txt is incomplete , and fail to use bash ./prepare_env.sh to install some packages

Undefined function

There is an undefined function in your code(dataloader.py, line 257, rand_train_test_idx). I can't find the function from your code and imported packages. What is this?

OGB data

speed comparison

As mentioned in the paper, the inductive inference time of GLNN is compared to other inference acceleration methods of GNN on 10 randomly chosen nodes, but the code does not include these experiments. So could you please provide some more details about how the inference time is measured and compared ?

Error Unpickling the Cora.npz data (and others)

Hi! I am trying to start running the code but I have encountered the following error I can't figure out when trying to load the .npz cora file.
Using backend: pytorch
WARNING:root:The OGB package is out of date. Your version is 1.3.3, while the latest version is 1.3.5.
Traceback (most recent call last):
File "/home/aaron/anaconda3/envs/glnn/lib/python3.6/site-packages/numpy/lib/npyio.py", line 460, in load
return pickle.load(fid, **pickle_kwargs)
_pickle.UnpicklingError: invalid load key, '\x0a'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_teacher.py", line 346, in
main()
File "train_teacher.py", line 329, in main
score = run(args)
File "train_teacher.py", line 210, in run
labelrate_val=args.labelrate_val,
File "/home/aaron/graphless-neural-networks/dataloader.py", line 49, in load_data
kwargs["labelrate_val"],
File "/home/aaron/graphless-neural-networks/dataloader.py", line 85, in load_cpf_data
data = load_npz_to_sparse_graph(data_path)
File "/home/aaron/graphless-neural-networks/dataloader.py", line 526, in load_npz_to_sparse_graph
with np.load(file_name, allow_pickle=True) as loader:
File "/home/aaron/anaconda3/envs/glnn/lib/python3.6/site-packages/numpy/lib/npyio.py", line 463, in load
"Failed to interpret file %s as a pickle" % repr(file))
OSError: Failed to interpret file PosixPath('/home/aaron/graphless-neural-networks/data/cora.npz') as a pickle

I think it has something to do with how the file is saved with different versions of numpy? I have used the same exact requirements.txt file for the conda environment.

Thanks!

A paper that copy your paper [一篇论文洗稿您的论文]

paper：https://openreview.net/forum?id=Cs3r5KLdoj
code：https://github.com/meettyj/NOSMOG

超级overclaim，把min-cut说成自己提出来的。把几个GNN2MLP的issue说成自己发现的。

The function graph_split() seems to contradict the inductive scenarios.

I have a question about the function graph_split in the file utils.py.

According to the code, the tensors idx_test_ind and obs_idx_train may overlap.

def graph_split(idx_train, idx_val, idx_test, rate, seed):
    idx_test_ind, idx_test_tran = idx_split(idx_test, rate, seed)

    idx_obs = torch.cat([idx_train, idx_val, idx_test_tran])
    N1, N2 = idx_train.shape[0], idx_val.shape[0]
    obs_idx_all = torch.arange(idx_obs.shape[0])
    obs_idx_train = obs_idx_all[:N1]
    obs_idx_val = obs_idx_all[N1 : N1 + N2]
    obs_idx_test = obs_idx_all[N1 + N2 :]

    return obs_idx_train, obs_idx_val, obs_idx_test, idx_obs, idx_test_ind

For example, let V = [0,1,2,3,4,5] be all nodes in the graph and idx_train = [1,2], idx_val = [3,4], idx_test = [0, 5].

Suppose that idx_test_ind = [0] and idx_test_tran = [5] after the function idx_split(). Then we have idx_obs = [1,2,3,4,5], N1=2, N2 = 2, and obs_idx_all = [0,1,2,3,4]. Hence, the resulting observed sets are obs_idx_train = [0,1], obs_idx_val = [2,3], obs_idx_test = [4].

This means that idx_test_ind and obs_idx_train both have the element 0, which contradicts the inductive scenario.

About min-cut

hello, I try to re-run citeseer under transduction setting.
The seeds are 1 2 3 4 5.

I get an average of 71.22, proving the correctness of my experiments.

however, for min-cut: I get
0.7159
0.6828
0.7444
0.9163
0.5613

It is highly unstable.

Meanwhile, for GLNN, I get:
0.9457
0.9499
0.9519
0.9670
0.9278

So maybe the min-cut just work for GLNN well and fail to capture the graph topology

Cannot reproduce the results even with the same random seed

Thanks for sharing the code! The random seed in train_teacher.py seems not to work as every time run python train_teacher.py --exp_setting tran --teacher SAGE --dataset cora will generate different results even with the same seed. Accordingly, we cannot reproduce the exact same results as stated in the paper when running bash experiments/sage_cpf.sh. This seems a bug since the point of random seed is to reproduce the results. Could you please fix this?

The two different results with same random seed 0 (python train_teacher.py --exp_setting tran --teacher SAGE --dataset cora):

The results of bash experiments/sage_cpf.sh, which is different to the paper: