Coder Social home page Coder Social logo

mhgcn's Introduction

MHGCN

This repository provides a reference implementation of MHGCN as described in the paper:

Multiplex Heterogeneous Graph Convolutional Network

Pengyang Yu, Chaofan Fu, Yanwei Yu, Chao Huang, Zhongying Zhao, Junyu Dong.

KDD'22

Available at https://doi.org/10.1145/3534678.3539482

Dependencies

Recent versions of the following packages for Python 3 are required:

  • numpy==1.21.2
  • torch==1.9.1
  • scipy==1.7.1
  • scikit-learn==0.24.2
  • pandas==0.25.0

Datasets

Link

The used datasets are available at:

Preprocess

We compress the data set into a mat format file, which includes the following contents.

  • edges: array of subnetworks after coupling, each element in the array is a subnetwork.
  • features: attributes of each node in the network.
  • labels: label of labeled points.
  • train: index of training set points for node classification.
  • valid: index of validation set points for node classification.
  • test: index of test set points for node classification.

In addition, we also sample the positive and negative edges in the network, and divide them into three text files: train, valid and test for link prediction.

Usage

First, you need to determine the data set. If you want to do node classification tasks, you need to modify the data set path in Node_classification.py. If you want to do link prediction, you need to modify the dataset path in Link_prediction.py.

Second, you need to modify the number of weights in Model.py. The number of weights should be the number of sub-networks after decoupling.

Finally, you need to determine the sub-network and the number of sub-networks in Decoupling_matrix_aggregation.py.

Execute the following command to run the node classification task:

  • python Node_Classification.py

Execute the following command to run the link prediction task:

  • python Link_Prediction.py

Citing

If you find MHGCN useful in your research, please cite the following paper:

@inproceedings{yu2022multiplex,
  title={Multiplex Heterogeneous Graph Convolutional Network},
  author={Yu, Pengyang and Fu, Chaofan and Yu, Yanwei and Huang, Chao and Zhao, Zhongying and Dong, Junyu},
  booktitle={Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  pages={2377--2387},
  year={2022}
} 

mhgcn's People

Contributors

nsssjss avatar tureful avatar

Stargazers

 avatar JessicaJHan avatar Minju Jo avatar sword avatar  avatar  avatar NJU Computational Biology Group avatar Ying Linghao avatar viego avatar ikun avatar Chenguang Du avatar  avatar Ethan Bei avatar Lakhder Amine avatar  avatar Silence1017 avatar  avatar Gethin avatar JOJO!!!! avatar  avatar HySon avatar  avatar Zhang Pengcheng avatar Zheng Yilun (Jaylen) avatar  avatar  avatar Rucong Xu avatar  avatar Xing Qingyu avatar wuwenjie avatar  avatar  avatar cyrus avatar Zhangtao Cheng avatar  avatar  avatar  avatar Ce Li avatar jxzeng avatar Yanwei Yu avatar  avatar  avatar KindLab avatar  avatar  avatar  avatar  avatar siwei avatar Cheng Yang avatar NLNR avatar  avatar Ramsey avatar Xu Hong Bo avatar

Watchers

 avatar

mhgcn's Issues

I would like to point out a simple error in the readme

In the process of my reproduction, the usage in the readme about running the Node_Classfication.py code is wrong.Should be "python Node_Classfication.py", not "python python Node_Classification.py". You guys did a great job and this is a great article. Sorry, but I may be a little OCD, I noticed that you guys updated the github repo but didn't change this little bug.

About the Compressed Datasets

Is it possible to provide more compressed datasets (e.g., AMiner, Alibaba, IMDB) or their preprocessing codes? I can't reproduce the results of MHGCN in Table 3 except for DBLP.

CPU or GPU?

Were the experiments conducted using a CPU or a GPU? I noticed that the code repository specifies the device as CPU, and the paper reports the use of a GPU with 8GB of memory. However, when I attempted to run the provided DBLP dataset directly from the repository, it resulted in an Out of Memory (OOM) error. So, should the model be run using a CPU?

NEED HELP!

I git clone the codes and constructed the required environment. But after running the Link_prediction of IMDB dataset, the answers are only ROC 0.56, F1 0.56, PR 0.57, whether there is anything wrong I did or the data is insufficient?

Why some edges in the test data couldn't be found in the original dataset?

I downloaded the IMDB dataset and there were three txt files called test, train and valid. I compared the test data with the original dataset (given by imdb_1_10.mat) and I found out that some links in the test data which are regared as positive links don't exist in the original dataset at all.
Do you have any ideas what happend?

dataset IMDB lack several files

Hello, recently i am reading your article, your work is excellent and i want to reproduce the link prediction on IMDB dataset.
I have downloaded the imdb_1_10.mat file, but when i run it, it reminded me that there are lack of these files:
data/IMDB/train.txt, data/IMDB/valid.txt, data/IMDB/test.txt
Can you upload these files? Thanks in advance. Wish you good luck.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.