nsssjss / mhgcn Goto Github PK

KDD'22

Python 100.00%

mhgcn's Introduction

MHGCN

This repository provides a reference implementation of MHGCN as described in the paper:

Multiplex Heterogeneous Graph Convolutional Network

Pengyang Yu, Chaofan Fu, Yanwei Yu, Chao Huang, Zhongying Zhao, Junyu Dong.

KDD'22

Available at https://doi.org/10.1145/3534678.3539482

Dependencies

Recent versions of the following packages for Python 3 are required:

numpy==1.21.2
torch==1.9.1
scipy==1.7.1
scikit-learn==0.24.2
pandas==0.25.0

Datasets

Link

The used datasets are available at:

Preprocess

We compress the data set into a mat format file, which includes the following contents.

edges: array of subnetworks after coupling, each element in the array is a subnetwork.
features: attributes of each node in the network.
labels: label of labeled points.
train: index of training set points for node classification.
valid: index of validation set points for node classification.
test: index of test set points for node classification.

In addition, we also sample the positive and negative edges in the network, and divide them into three text files: train, valid and test for link prediction.

Usage

First, you need to determine the data set. If you want to do node classification tasks, you need to modify the data set path in Node_classification.py. If you want to do link prediction, you need to modify the dataset path in Link_prediction.py.

Second, you need to modify the number of weights in Model.py. The number of weights should be the number of sub-networks after decoupling.

Finally, you need to determine the sub-network and the number of sub-networks in Decoupling_matrix_aggregation.py.

Execute the following command to run the node classification task:

python Node_Classification.py

Execute the following command to run the link prediction task:

python Link_Prediction.py

Citing

If you find MHGCN useful in your research, please cite the following paper:

@inproceedings{yu2022multiplex,
  title={Multiplex Heterogeneous Graph Convolutional Network},
  author={Yu, Pengyang and Fu, Chaofan and Yu, Yanwei and Huang, Chao and Zhao, Zhongying and Dong, Junyu},
  booktitle={Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  pages={2377--2387},
  year={2022}
}

mhgcn's People

Contributors

Stargazers

Watchers

Forkers

ychuest yuyanwei spiritualmedicine hussien wxjwww suzebin-lab kid-young dyodn2 zhanos24

mhgcn's Issues

I would like to point out a simple error in the readme

In the process of my reproduction, the usage in the readme about running the Node_Classfication.py code is wrong.Should be "python Node_Classfication.py", not "python python Node_Classification.py". You guys did a great job and this is a great article. Sorry, but I may be a little OCD, I noticed that you guys updated the github repo but didn't change this little bug.

Where is the Aminer_10k_4class.mat file?

Where is the Aminer dataset?

The link "https://github.com/librahu/" for the Aminer dataset links to a GitHub account; where can I obtain the dataset? It's not a new dataset you've proposed; you must cite its source instead of just providing an empty link. It's not good!

Can you provide me with Alibaba Dataset？

Dataset Alibaba https://tianchi.aliyun.com/competition/entrance/231719/information/）can't be downloaded?

About the Compressed Datasets

Is it possible to provide more compressed datasets (e.g., AMiner, Alibaba, IMDB) or their preprocessing codes? I can't reproduce the results of MHGCN in Table 3 except for DBLP.

link prediction train/test/valid .txt files

hi, Thank you for this repository.
Can you provide us with Amazon's train and test and valid txt files or tell us how to make them from .mat file?

CPU or GPU?

Were the experiments conducted using a CPU or a GPU? I noticed that the code repository specifies the device as CPU, and the paper reports the use of a GPU with 8GB of memory. However, when I attempted to run the provided DBLP dataset directly from the repository, it resulted in an Out of Memory (OOM) error. So, should the model be run using a CPU?

Download the DBLP Dataset

Hi, thanks a lot for your paper contributions it looks interesting.
I am trying to download the DBLP dataset from this link but I can not.
https://pan.baidu.com/s/1Lup8ErTeviI-QKwrWurfVA Extraction code: 2jr6
can you please send the data set by mail to [email protected]? also, is there a script to convert a custom dataset (TSV format) to the required format?
thanks in advance.

The results of baselines on DBLP

Hi,
Why the results of baselines(e.g., HAN, GTN) on DBLP is too low (compared with the results reported in their papers)?

NEED HELP!

I git clone the codes and constructed the required environment. But after running the Link_prediction of IMDB dataset, the answers are only ROC 0.56, F1 0.56, PR 0.57, whether there is anything wrong I did or the data is insufficient?

为什么验证和测试过程没把模型设置成评估模式

Why some edges in the test data couldn't be found in the original dataset?

I downloaded the IMDB dataset and there were three txt files called test, train and valid. I compared the test data with the original dataset (given by imdb_1_10.mat) and I found out that some links in the test data which are regared as positive links don't exist in the original dataset at all.
Do you have any ideas what happend?

dataset IMDB lack several files

Hello, recently i am reading your article, your work is excellent and i want to reproduce the link prediction on IMDB dataset.
I have downloaded the imdb_1_10.mat file, but when i run it, it reminded me that there are lack of these files:
data/IMDB/train.txt, data/IMDB/valid.txt, data/IMDB/test.txt
Can you upload these files? Thanks in advance. Wish you good luck.