Coder Social home page Coder Social logo

dines's Introduction

DINES

This is the official implementation of DINES (Disentangled Neural Networks for Signed Digraph). The paper is submitted to Information Sciences, and under review:

  • Learning Disentangled Representations in Signed Directed Graphs without Social Assumptions
    Geonwoo Ko and Jinhong Jung
    Information Sciences (submitted)

Overview

Signed graphs are complex systems that represent trust relationships or preferences in various domains. Learning node representations in such graphs is crucial for many mining tasks. Although real-world signed relationships can be influenced by multiple latent factors, most existing methods often oversimplify the modeling of signed relationships by relying on social theories and treating them as simplistic factors. This limits their expressiveness and their ability to capture the diverse factors that shape these relationships.

In this paper, we propose DINES, a novel method for learning disentangled node representations in signed directed graphs without social assumptions. We adopt a disentangled framework that separates each embedding into distinct factors, allowing for capturing multiple latent factors. We also explore lightweight graph convolutions that focus solely on sign and direction, without depending on social theories. Additionally, we propose a decoder that effectively classifies an edge's sign by considering correlations between the factors. To further enhance disentanglement, we jointly train a self-supervised factor discriminator with our encoder and decoder.

Prerequisites

The packages used in this repository are as follows:

python==3.9.16
numpy==1.24.3
pytorch==2.0.1
pytorch-cuda==11.7
pytorch-scatter==2.1.1
scikit-learn==1.2.2
scipy==1.10.1
fire==0.5.0
loguru==0.7.0
torchmetrics==0.8.1
tqdm==4.65.0

You can create a conda environment with these packages by typing the following command in your terminal:

conda env create --file environment.yml
conda activate DINES

Datasets

We provide datasets used in the paper for reproducibility. You can find raw datasets at ./data/${DATASET} folder where the file's name is edges.csv. The ${DATASET} is one of BC_ALPHA, BC_OTC, WIKI_RFA, SLASHDOT, and EPINIONS. This file contains the list of signed edges where each line consists of a tuple of (src, dst, sign). The details of datasets are provided in the following table:

Dataset $|\mathcal{V}|$ $|\mathcal{E}|$ $|\mathcal{E}^{+}|$ $|\mathcal{E}^{-}|$ $p$(+)
BitcoinAlpha 3,783 24,186 22,650 1,536 93.6
BitcoinOTC 5,881 35,592 32,029 3,563 90.0
Wiki-RFA 11,258 178,096 138,473 38,623 78.3
Slashdot 79,120 515,397 392,326 123,255 76.1
Epinions 131,828 841,372 717,667 123,705 85.3
  • $|\mathcal{V}|$: the number of nodes
  • $|\mathcal{E}|$: the number of edges
  • $|\mathcal{E}^{+}|$ and $|\mathcal{E}^{-}|$: the numbers of positive and negative edges, respectively
  • $p$(+): the ratio of positive edges

Demo

You can run the simple demo by typing the following command in your terminal:

bash demo.sh

This trains DINES on the BC_ALPHA dataset with the hyperparameters stored at ./pretrained/BC_ALPHA/config.json. After the training phase completes, the trained model is saved as encoder.pt and decoder.pt at the folder ./output/BC_ALPHA. Then, it evaluates the trained model on the link sign prediction task in terms of AUC and Macro-F1.

Pre-trained DINES

We provide pre-trained models of DINES for each data stored at ./pretrained/${DATASET} folder where the file names are encoder.pt and decoder.pt. The hyperparameters used for training them are reported in the Appendix section of the paper, and they are saved in ./pretrained/${DATASET}/config.json.

Results of Pre-trained DINES

The results of the pre-trained models are as follows:

Dataset AUC Macro-F1
BC_ALPHA 0.937 0.789
BC_OTC 0.950 0.860
WIKI_RFA 0.914 0.786
SLASHDOT 0.927 0.831
EPINIONS 0.967 0.895

All experiments are conducted on RTX 3090 (24GB) with cuda version 12.0, and the above results were produced with the random seed seed=1.

How to Reproduce the Above Results with the Pre-traied Models

You can reproduce the results the following command which evaluates a test dataset using a pre-trained model.

python ./src/run_evaluate.py --input-dir ./pretrained --dataset ${DATASET} --gpu-id ${GPU_ID}

The pre-trained models were generated by the following command:

python ./src/run_train.py --load-config --output_dir ./pretrained --dataset ${DATASET} --seed 1 

Detailed Usage and Options

You can train and evaluate with your own datasets or custom hyperparmeters using run_train.py and run_evaluate.py.

Training

You can perform the training process of DINES with the following command:

python src/run_train.py [--<argument name> <argument value>] [...]

We describe the detailed options of src/run_train.py in the following table:

Option Description Default
load-config whether to load the configuration used in a pre-trained model False
dataset dataset name BC_ALPHA
data-dir data directory path ./data
output-dir output directory path ./output
test-ratio ratio of test edges 0.2
gpu-id GPU id; If None, a CPU is used None
seed random seed; If None, the seed is not fixed None
in-dim input feature dimension 64
out-dim output embedding dimension 64
num-epochs number of epochs 100
lr learning rate $\eta$ of an optimizer 0.005
weight-decay strength $\lambda_{\texttt{reg}}$ of L2 regularization 0.005
num-factors number $K$ of factors 8
num-layers number $L$ of layers 2
lambda-disc strength $\lambda_{\texttt{disc}}$ of the discriminative loss 0.1
aggr-type aggregator type (sum, max, mean, attn) sum
  • Note that several PyTorch APIs such as torch.index_add_ run non-deterministically on a GPU [link]; thus, the results on the GPU could be slightly different every run although we fix the random seed (but, the difference is not statistically significant).
  • For a strict reproducibility, we provide an additional option using a CPU, i.e., --device=None forces the code to run on the CPU, and makes the procedure deterministic by setting torch.use_deterministic_algorithms(True). If you want PyTorch to use its non-deterministic algorithms on the CPU, please remove the function call from the code.

Evaluation

We provide a script that evaluates the trained model of DINES, and reports AUC and Macro-F1 scores on a test dataset. This uses encoder.pt, decoder.pt, and config.json; thus, you first need to check tif they are appropriately generated by ./src/run_train.py. Note that it uses the same random seed used by ./src/run_train.py where the seed is saved at config.json so that the test dataset is valid for the evaluation.

python src/run_evaluate.py [--<argument name> <argument value>] [...]

We describe the detailed options of src/run_evaluate.py in the following table:

Option Description Default
dataset dataset name BC_ALPHA
input-dir directory path where a pre-trained DINES is stored ./output
gpu-id GPU id; If None, a CPU is used None

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.