ibm / evolvegcn Goto Github PK

Code for EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs

License: Apache License 2.0

Python 99.54% Dockerfile 0.46%

evolvegcn's Introduction

EvolveGCN

This repository contains the code for EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs, published in AAAI 2020.

Data

7 datasets were used in the paper:

stochastic block model: See the 'data' folder. Untar the file for use.
bitcoin OTC: Downloadable from http://snap.stanford.edu/data/soc-sign-bitcoin-otc.html
bitcoin Alpha: Downloadable from http://snap.stanford.edu/data/soc-sign-bitcoin-alpha.html
uc_irvine: Downloadable from http://konect.uni-koblenz.de/networks/opsahl-ucsocial
autonomous systems: Downloadable from http://snap.stanford.edu/data/as-733.html
reddit hyperlink network: Downloadable from http://snap.stanford.edu/data/soc-RedditHyperlinks.html
elliptic: A preprocessed version of https://www.kaggle.com/ellipticco/elliptic-data-set is provided in the following link: ~~https://ibm.box.com/s/j04m8lwoqktjixke2gj7lgllrvvdidme.~~ Untar the file in the 'data' folder for use.

Update on elliptic: The box link is no longer valid. Please see the instruction to manually prepare the preprocessed version.

For downloaded data sets please place them in the 'data' folder.

Requirements

PyTorch 1.0 or higher
Python 3.6

Set up with Docker

This docker file describes a container that allows you to run the experiments on any Unix-based machine. GPU availability is recommended to train the models. Otherwise, set the use_cuda flag in parameters.yaml to false.

Requirements

Installation

1. Build the image

From this folder you can create the image

sudo docker build -t gcn_env:latest docker-set-up/

2. Start the container

Start the container

sudo docker run -ti  --gpus all -v $(pwd):/evolveGCN  gcn_env:latest

This will start a bash session in the container.

Usage

Set --config_file with a yaml configuration file to run the experiments. For example:

python run_exp.py --config_file ./experiments/parameters_example.yaml

Most of the parameters in the yaml configuration file are self-explanatory. For hyperparameters tuning, it is possible to set a certain parameter to 'None' and then set a min and max value. Then, each run will pick a random value within the boundaries (for example: 'learning_rate', 'learning_rate_min' and 'learning_rate_max'). The 'experiments' folder contains one file for each result reported in the EvolveGCN paper.

Setting 'use_logfile' to True in the configuration yaml will output a file, in the 'log' directory, containing information about the experiment and validation metrics for the various epochs. The file could be manually analyzed, alternatively 'log_analyzer.py' can be used to automatically parse a log file and to retrieve the evaluation metrics at the best validation epoch. For example:

python log_analyzer.py log/filename.log

Reference

[1] Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, Tao B. Schardl, and Charles E. Leiserson. EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs. AAAI 2020.

BibTeX entry

Please cite the paper if you use this code in your work:

@INPROCEEDINGS{egcn,
  AUTHOR = {Aldo Pareja and Giacomo Domeniconi and Jie Chen and Tengfei Ma and Toyotaro Suzumura and Hiroki Kanezashi and Tim Kaler and Tao B. Schardl and Charles E. Leiserson},
  TITLE = {{EvolveGCN}: Evolving Graph Convolutional Networks for Dynamic Graphs},
  BOOKTITLE = {Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence},
  YEAR = {2020},
}

evolvegcn's People

Contributors

Stargazers

Watchers

Forkers

zhh0998 qianruw littlebadrobot cse1106 getg yinkexin fagan2888 claradepaolis osmanmalik abel0828 jihochoi dennymtz shengzhang90 attrna suzhu1988 lgalke crystal22 ammieqi liuxyjlu xiangnanyue ccfbupt ronaldpereira mosheber fjpsxh gaoshan2006 shlim1 dherath mvijaikumar lims1 zsun227 jplewa dlwbm123 bhaskers-blu-org1 richard-he ghoumrassi jxh4945777 pa-wan tornadozou t-ichiyou wjm199717 milensys m-zakeri scape1989 rafaelhuang87 vernase junwei-h dominic-sylvester ameliaduan cecily01001 xiaolinhan zmeos shuowang-ai sunfeng90 pachongchong yangwagn jianhua2022 1002887134 semplicesteak hanseul-jeong csudragonzl kiminh elonwangdr gabriel1521 blackhorseq bhupendramishra zeynepp firewolffff stylishsdp pingoolp caodh mc-o kerins628 cz279708573 tedzhouhk hmy626 hanhualong520 shiminxst huahuachang duanmuxiangyu sunamur luckylym frankliumin 2359181042 guojy-eatstg musicjae gyh-bupt xiaoningwang chloesheep dabangbang v-cyberpunk-01 zjy23 vincentwei2021 rosieiiiii xieshaocong-ethan zxypro1 energyyou chieh-yu icloudsong hkanezashi voscar-zhang

evolvegcn's Issues

Whether the deletion of nodes and edges is not accepted

Docker image has been unavailable

Hi team, the Dockerfile requires image nvidia/cuda:10.1-base-ubuntu16.04. However, this image has been unavailable on DockerHub. And Cuda 10.1 does not support newer GPUs such as Nvidia RTX 3080ti.

Any ideas?

Paper added to PyTorch Geometric Temporal

The models are available here:

https://github.com/benedekrozemberczki/pytorch_geometric_temporal

I can't get the elliptic:preprocessed version,the web shows "The shared file or folder link has been removed or is unavailable to you."

Evaluation

Recommendation for Dynamic GNN project where new nodes are added per time-step

Hi,
I am currently trying to find the best example from this repository with application on graphs where new nodes are added at each time-step. For example, I would like to predict the number of new nodes added at future time steps, along with their positions (node features). Which example from this repository would be best to try out?

dimension problem

Hello~
Recently, I am reading your paper carefully. In model EGCU, I'm confused about how the dimension of the parameter matrix change, can you explain it?
looking forward to your reply.

I have constructed the preprocessed version of the Elliptic data set.

Follow the instuction,I have constructed the preprocessed version of the Elliptic data set,and uploaded to Baidu Netdisk:
link：https://pan.baidu.com/s/1hyKewH_MxzCArGZ1FCU_1g extraction code：8888

The parameter "adj_mat_time_window" cannot be set to None.

When setting the parameter adj_mat_time_window: None in the parameters_example.yaml file, the experiment crashes with the error:

Traceback (most recent call last): File "/evolveGCN/run_exp.py", line 220, in <module> tasker = build_tasker(args,dataset) File "/evolveGCN/run_exp.py", line 122, in build_tasker return lpt.Link_Pred_Tasker(args,dataset) File "/evolveGCN/link_pred_tasker.py", line 38, in __init__ self.get_node_feats = self.build_get_node_feats(args,dataset) File "/evolveGCN/link_pred_tasker.py", line 98, in build_get_node_feats max_deg,_ = tu.get_max_degs(args,dataset) File "/evolveGCN/taskers_utils.py", line 60, in get_max_degs cur_adj = get_sp_adj(edges = dataset.edges, File "/evolveGCN/taskers_utils.py", line 96, in get_sp_adj subset = subset * (idx[:,ECOLS.time] > (time - time_window)) TypeError: unsupported operand type(s) for -: 'int' and 'str'

The explanation from the same file is: "# Time window to create the adj matrix for each timestep. Use None to use all the history (from 0 to t)". Is there another way to use all the history from 0 to t?

elliptic (not temporal) is missing

running an experiment on elliptic without temporal meaning setting 'data' parameter in yaml file as elliptic doesn't work.

Also the 'elliptic_dl' script is missing which is used in the 'run_exp.py'. Is it possible to upload the python script which handles the non temporal elliptic dataset?

On another note will the preprocessed version of the elliptic dataset be publicly available since Im trying to reproduce results, the following link is not publicly accessible

https://ibm.box.com/s/j04m8lwoqktjixke2gj7lgllrvvdidme

I am trying to produce results found in the following paper 'Anti-Money Laundering in Bitcoin: Experimenting with Graph
Convolutional Networks for Financial Forensics'

Use of mask

Mask is used to select topK nodes, while the operation is addition rather multiplication which is a common way of using mask. Could you plz answer this question or give an example of how mask influences the calculated score in topK selection.

A role of mask

Hi, It might be silly, but i'm just curious about mask for summerization.

What is the role of mask?
I think Top K could be enough.
Is there any reason of removing some nodes to generate summerized features?

Thanks

What does the adjacency matrix window size mean?

As per the title, what does the adj_mat_size_window mean exactly? (in the yaml parameters files) I know there's a clarification note but I'm still struggling to grasp what exactly this adjusts for

AttributeError: Can't pickle local object 'Link_Pred_Tasker.build_get_node_feats.<locals>.get_node_feats'

How could I handle this error?

Can edges have features in this implementation or is it just nodes that can have features?

I have some AMLSim data and I am trying to analyse it using EvolveGCN. The way I want to model this is make all accounts vertices and transactions between accounts to be edges. I want to include information in the transaction as edge features and consider that for edge classification. Is that possible with EvolveGCN?

Build prepare node features

In the code for build_prepare_node_features in node_cls_tasker.py, the code says
else: def prepare_node_feats(node_feats): return node_feats[0] #I'll have to check this up

So what exactly should be returned?

Cannot output node embedding for link prediction task

I have tried to output node embedding for link prediction task with example data set sbm. However, I got the error message said that sbm_dataset object has no attribute 'contID_to_origID'. What is contID? How can I fix this problem?

Dynamic network data processing

I am working on dynamic social network ananlysis.
I see your data about bitcoin trading network on SNAP website. I know this is a dynamic network. But I don't know which data is the network data at the same time, and I don't know how to segment this network data. The data time includes month, year and day.
Can you tell me how to analyze this dynamic bitcoin network data?
Thank you!

batch size for the models

Hi,

Thanks for making this work public. I wanted to know if a batch size of more than 1 is supported. I saw in the example yaml not to change the batch size. I tried with the reddit yaml example and using batch size larger than 1 causes the program to crash. Is there a way to fix this issue to enable larger batches?. Thank you

What is the "elliptic_bitcoin_dataset_cont.tar.gz" in parameters_ellipitc_egnc_h.yaml line 4?

It doesn't mentioned in https://github.com/IBM/EvolveGCN/blob/master/elliptic_construction.md

About the datasplit settings

Hi, I'm trying to produce the results in the article, but I found the settings are different. In the splitter.py there is a special handling , so the train sets are args.num_hist_steps less than the numbers in the article. Is this code produce your results or the comment out?
#only the training one requires special handling on start, the others are fine with the split IDX.
start = tasker.data.min_time + args.num_hist_steps #-1 + args.adj_mat_time_window
I tried to use the comment out codes, but index errors come out, please help.

ibm / evolvegcn Goto Github PK

evolvegcn's Introduction

EvolveGCN

Data

Requirements

Set up with Docker

Requirements

Installation

1. Build the image

2. Start the container

Usage

Reference

BibTeX entry

evolvegcn's People

Contributors

Stargazers

Watchers

Forkers

evolvegcn's Issues

Recommend Projects

Recommend Topics

Recommend Org