Coder Social home page Coder Social logo

cornell-zhang / graphzoom Goto Github PK

View Code? Open in Web Editor NEW
110.0 10.0 15.0 18.67 MB

GraphZoom: A Multi-level Spectral Approach for Accurate and Scalable Graph Embedding

License: BSD 3-Clause "New" or "Revised" License

Shell 1.26% Python 33.57% C 20.57% Batchfile 0.21% C++ 20.59% Makefile 0.04% Java 23.76%
graph-learning

graphzoom's Introduction

GraphZoom

GraphZoom is a framework that aims to improve both performance and scalability of graph embedding techniques. As shown in the following figure, GraphZoom consists of 4 kernels: Graph Fusion, Spectral Coarsening, Graph Embedding, and Embedding Refinement. GraphZoom More details are available in our paper: https://openreview.net/forum?id=r1lGO0EKDH

Overview of the GraphZoom framework

Citation

If you use GraphZoom in your research, please cite our preliminary work published in ICLR'20.

@inproceedings{deng2020graphzoom,
title={GraphZoom: A Multi-level Spectral Approach for Accurate and Scalable Graph Embedding},
author={Chenhui Deng and Zhiqiang Zhao and Yongyu Wang and Zhiru Zhang and Zhuo Feng},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=r1lGO0EKDH}
}

Spectral Coarsening Options

  • lamg-based coarsening: This is the spectral coarsening algorithm used in the original paper, but it requires you to download Matlab Compiler Runtime (MCR).
  • simple coarsening: This is a simpler spectral coarsening implemented via python and you do not need to download MCR. This algorithm adopts a similar idea to coarsen the graph (spectrum-preserving), while it may compromise the performance compared to lamg-based coarsening (especially for run-time speedup).

Requirements

  • Matlab Compiler Runtime (MCR) 2018a(Linux), which is a standalone set of shared libraries that enables the execution of compiled MATLAB applications and does not require license to install (only required if you run lamg-based coarsening).
  • python 3.5/3.6/3.7 (We suggest Conda to manage package dependencies.)
  • numpy
  • networkx
  • scipy
  • scikit-learn
  • gensim, only required by deepwalk, node2vec
  • tensorflow, only required by graphsage
  • torch, ogb, pytorch_geometric, only required by Open Graph Benchmark (OGB) examples

Installation

1. wget https://ssd.mathworks.com/supportfiles/downloads/R2018a/deployment_files/R2018a/installers/glnxa64/MCR_R2018a_glnxa64_installer.zip`
2. unzip MCR_R2018a_glnxa64_installer.zip -d YOUR_SAVE_PATH
3. cd YOUR_SAVE_PATH
4. ./install -mode silent -agreeToLicense yes -destinationFolder YOUR_MCR_PATH
  • install PyTorch Geometric (only required if you run OGB examples)
  • create virtual environment (skip if you do not want)
1. conda create -n graphzoom python=3.6
2. conda activate graphzoom
  • install packages for graphzoom
pip install -r requirements.txt

Directory Stucture

GraphZoom/
│   README.md
│   requirements.txt
│   ... 
│
└───graphzoom/
│   │   graphzoom.py
│   │   cora.sh
│   │   ...  
│   │ 
│   └───dataset/
│   │   │    cora
│   │   │    citeseer
│   │   │    pubmed
│   │  
│   └───embed_methods/
│       │    DeepWalk
│       │    node2vec
│       │    GraphSAGE
│ 
└───mat_coarsen/
│   │   make.m
│   │   LamgSetup.m
│   │   ...  
│
└───ogb/
│   │   ...
│   └───ogbn-arxiv/ 
│   │    │   main.py
│   │    │   mlp.py
│   │    │   arxiv.sh   
│   │    │   ...  
│   │    
│   └───ogbn-products/ 
│        │   main.py
│        │   mlp.py
│        │   products.sh  
│        │   ...
│

Usage

Note: If you run lamg-based coarsening, you have to pass the root directory of matlab compiler runtime to the argument--mcr_dir when running graphzoom.py

Example Usage

  1. cd graphzoom

  2. python graphzoom.py --mcr_dir YOUR_MCR_PATH --dataset citeseer --search_ratio 12 --num_neighs 10 --embed_method deepwalk --coarse lamg

--coarse: choose a specific algorithm for coarsening, [lamg, simple]

--reduce_ratio: the reduction ratio when choosing lamg-based coarsening method

--level: the coarsening level when choosing simple coarsening method

--mcr_dir: root directory of matlab compiler runtime

--dataset: input dataset, currently supports "json" format

--embed_method: choose a specific basic embedding algorithm

--search_ratio: control the search space of graph fusion

--num_neighs: control number of edges in feature graph

Full Command List The full list of command line options is available with python graphzoom.py --help

Highlight in Flexibility

You can easily plug a new unsupervised graph embedding model into GraphZoom, just implement a new function, which takes a graph as input and outputs an embedding matrix, in graphzoom/embed_methods.

The current version of GraphZoom can support the following basic models:

  • DeepWalk
  • node2vec
  • GraphSAGE

Dataset

  • Cora
  • Citeseer
  • Pubmed

You can add your own dataset following the json format in graphzoom/dataset

Experimental Results

Here we evaluate GraphZoom on Cora dataset with DeepWalk as basic embedding model, with lamg-based coarsening method. GraphZoom-i denotes applying GraphZoom with i-th coarsening level.

Method Accuracy Speedup Graph_Size
DeepWalk 71.4 1x 2708
GraphZoom-1 76.9 2.5x 1169
GraphZoom-2 77.3 6.3x 519
GraphZoom-3 75.1 40.8x 218

We also evaluate Graphzoom on ogbn-arxiv and ogbn-products dataset with lamg-based coarsening method, and GraphZoom-1 has better performance and much fewer parameters than the Node2vec baseline.

ogbn-arxiv

Method Accuracy #Params
Node2vec 70.07 ± 0.13 21,818,792
GraphZoom-1 71.18 ± 0.18 8,963,624

ogbn-products

Method Accuracy #Params
Node2vec 72.49 ± 0.10 313,612,207
GraphZoom-1 74.06 ± 0.26 120,251,183

LAMG Coarsening Code

The matlab version of lamg-based spectral coarsening code is available in mat_coarsen/

graphzoom's People

Contributors

cdeng36439 avatar chenhui1016 avatar dependabot[bot] avatar xiuyu-li avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graphzoom's Issues

Inquiry about the embedding space

Dear Zhang,

I am now running the experiments from your code. I get the embedding representation and then compute l2 distance as the kernel for node classification and link prediction, but it is poor performance.
Could you recommend the embedding space pls?

Thanks
Wei

can graphzoom handle directed graph data?

Hi, I'm in a situation where I have a directed graph data for input. And I was wondering if your graphzoom can handle directed graph data. If yes, which part of your code should I rewrite? Thank you.

question about gensim and numpy version

Hello,
Thank you for the code. I installed the dependencies from requirement but somehow I am not able to run the code. Can you confirm that gensim==3.4.0 and numpy==1.16.4?

When I used gensim==3.4.0, I got this error
Screen Shot 2022-05-07 at 3 56 53 PM

However, if I update the gensim to latest version 4.2.0, the numpy will also be updated to 1.21.6 and I get an error loading the data (dataset/citeseer/citeseer-feats.npy).

File "graphzoom.py", line 105, in main
    feature = np.load(feature_path)
  File "/home/chen/anaconda3/envs/graphzoom/lib/python3.7/site-packages/numpy/lib/npyio.py", line 441, in load
    pickle_kwargs=pickle_kwargs)
  File "/home/chen/anaconda3/envs/graphzoom/lib/python3.7/site-packages/numpy/lib/format.py", line 787, in read_array
    array.shape = shape
ValueError: cannot reshape array of size 4082800 into shape (3327,3703)

Would you like to clarify the version of numpy and gensim? Thank you!

about projections of cora nodes

Hi, thanks for your great work.
I have a question about the projections stored in graphzoom/reduction_results/Projection_1.mtx
It seems that the 2708 nodes are projected onto 1169 nodes. But there's no explanation about what the number 1169 indicates. Could you provide an explanation about it and how did you create this projections?
Thank you in advance.

What's reduction and its relationship with levels?

Hi, I read your scipts and seems that we need to control the coarsen level by reduction_ratio in bash. what;s their relationship and how can I determine coarsen level based on that variable?

when I try reduction_ratio=3 or 4 on cora, the final coarsen level is the same(level=2) and the size of coarsed graph remains the same, so I am cuious.

How to generate reduction_results/Mapping.mtx for a new dataset?

Dear Dr. Zhang,

I am interested in your algorithm, and running the code in my experiment. For a new dataset, how can I generate new Mapping.mtx in the directory of reduction_results? graphzoom.py just calls for mapping_path = "{}Mapping.mtx".format(reduce_results), and I cannot find how to generate in graphzoom.py and utils.py. No Mapping.mtx, the code cannot run successfully.

I would appreciate it very much if you could help me.

Thanks
Wei

questions about spectral_coarsening algorithm

Hi, I have a question about pseudo code of spectral_coarsening algorithm in your paper, p.15.
Which part of this code does the 8th to 12th lines of pseudo code correspond to? I looked for your codes, but I couldn't find the part that corresponds to this. Also, could you explain what does these lines indicate? Thank you in advance.

what to modify in order to be able to run on Mac OS?

Hi, I am trying to do the experiment on my computer which has a Mac system and I have downloaded the corresponding matlab runtime for Mac.
However it seems that coarsening cannot be properly executed. I guess it has something wrong with the LD_LIBRARY_PATH in the run_coarsening.sh file.
My question is: if I would like to run it with a Mac system, how should I modify the files?

graph fusion coefficient beta

Hi, thanks for your great work. I have a question about graph fusion coefficient beta, where can i find it in your codes?
Thank you in advance for your help.

How to modify and compile LamgSetup.m?

Hi, thanks for your great work;
I have been reading your code. As for the code file in mat_coarsen, I wonder if I can modify and cmopile the binary file?
Could you provide the instructions for this,
thanks ;)

About graph coarsening algorithm LAMG

About the graph coarsening step, in the paper, it is said that

In this work, a similarity-aware spectral sparsification tool “GRASS” (Feng, 2018) has been adopted for achieving a desired graph sparsity at the coarsest level.

which refers to this paper:
[1] Zhuo Feng. Similarity-aware spectral sparsification by edge filtering. Design Automation Conference (DAC), pp. 1–6, 2018.

In the README of this repo, it is said that LAMG algorithm is used. And I have not found much information about LAMG in the above paper [1].

Where can I find more information about this LAMG algorithm used in the program?

question on running graphzoom

Hi, I am trying to run this program on my windows PC, but it shows 'AttributeError: Can't pickle local object 'DeepWalk_Original.generate_walks..rnd_walk_workers'' on Pycharm.
What should I do to fix this error? Or should I run it on Linux?
Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.