ReDKG

Reinforcement learning on Dynamic Knowledge Graphs (ReDKG) is a toolkit for deep reinforcement learning on dynamic knowledge graphs. It is designed to encode static and dynamic knowledge graphs (KG) by constructing vector representations for the entities and relationships. The reinforcement learning algorithm based on vector representations is designed to train recommendation models or models of decision support systems based on reinforcement learning (RL) using vector representations of graphs.

Installation

Python >= 3.9 is required

As a first step, Pytorch Geometric installation and Torch 1.1.2 are required.

PyTorch 1.12

# CUDA 10.2
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch
# CUDA 11.3
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
# CUDA 11.6
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
# CPU Only
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cpuonly -c pytorch

When Torch installed clone this repo and run inside repo directory:

pip install .

Download test data

Download ratings.csv to /data/ folder./ Data folder should contain the following files:

ratings.csv - raw rating file;
attributes.csv - raw attributes file;
kg.txt - knowledge graph file;
item_index2enity_id.txt - the mapping from item indices in the raw rating file to entity IDs in the KG file;

Example of Using KGE Models

Preprocess the data

from redkg.config import Config
from redkg.preprocess import DataPreprocessor

config = Config()
preprocessor = DataPreprocessor(config)
preprocessor.process_data()

Train KG model

kge_model = KGEModel(
    model_name="TransE",
    nentity=info['nentity'],
    nrelation=info['nrelation'],
    hidden_dim=128,
    gamma=12.0,
    double_entity_embedding=True,
    double_relation_embedding=True,
    evaluator=evaluator
)

training_logs, test_logs = train_kge_model(kge_model, train_pars, info, train_triples, valid_triples)

Example of Using GCN, GAT, GraphSAGE Models

These models implement an algorithm for predicting links in a knowledge graph.

Additional information about training steps can be found in basic_link_prediction.ipynb example.

Loading Test Data

The test dataset can be obtained from the link jd_data2.json and placed in the /data/ directory.

Data Preprocessing

For preprocessing, it is necessary to read data from the file and convert it into PyTorch Geometric format.

import json
import torch
from torch_geometric.data import Data

# Read data from the file
with open('jd_data2.json', 'r') as f:
    graph_data = json.load(f)

# Extract the list of nodes and convert it to a dictionary for quick lookup
node_list = [node['id'] for node in graph_data['nodes']]
node_mapping = {node_id: i for i, node_id in enumerate(node_list)}
node_index = {index: node for node, index in node_mapping.items()}

# Create a list of edges in PyTorch Geometric format
edge_index = [[node_mapping[link['source']], node_mapping[link['target']]] for link in graph_data['links']]
edge_index = torch.tensor(edge_index, dtype=torch.long).t().contiguous()
features = torch.randn(len(node_list), 1)
labels = torch.tensor(list(range(len(graph_data['nodes']))), dtype=torch.long)

large_dataset = Data(x=features, edge_index=edge_index, y=labels, node_mapping=node_mapping, node_index=node_index)
torch.save(large_dataset, 'large_dataset.pth')
large_dataset.cuda()

Next, it is necessary to generate subgraphs for training the model. This can be done using the following code:

import json
import os
from redkg.generate_subgraphs import generate_subgraphs

# Generate a dataset of 1000 subgraphs, each containing between 3 and 15 nodes
if not os.path.isfile('subgraphs.json'):
    subgraphs = generate_subgraphs(graph_data, num_subgraphs=1000, min_nodes=3, max_nodes=15)
    with open('subgraphs.json', 'w') as f:
        json.dump(subgraphs, f)
else:
    with open('subgraphs.json', 'r') as f:
        subgraphs = json.load(f)

Next, convert the subgraphs into PyTorch Geometric format:

from redkg.generate_subgraphs import generate_subgraphs_dataset

dataset = generate_subgraphs_dataset(subgraphs, large_dataset)

Model Training

Let's initialize the optimizer and the model in training mode:

from redkg.models.graphsage import GraphSAGE
from torch.optim import Adam

# Train the GraphSAGE model (GCN or GAT can also be used)
#   number of input and output features matches the number of nodes in the large graph - 177
#   number of layers - 64
model = GraphSAGE(large_dataset.num_node_features, 64, large_dataset.num_node_features)
model.train()

# Use the Adam optimizer
#   learning rate - 0.0001
#   weight decay - 1e-5
optimizer = Adam(model.parameters(), lr=0.0001, weight_decay=1e-5)

Start training the model for 2 epochs:

from redkg.train import train_gnn_model
from redkg.negative_samples import generate_negative_samples

# Model training
loss_values = []
for epoch in range(2):
    for subgraph in dataset:
        positive_edges = subgraph.edge_index.t().tolist()
        negative_edges = generate_negative_samples(subgraph.edge_index, subgraph.num_nodes, len(positive_edges))
        if len(negative_edges) == 0:
            continue
        loss = train_gnn_model(model, optimizer, subgraph, positive_edges, negative_edges)
        loss_values.append(loss)
        print(f"Epoch: {epoch}, Loss: {loss}")

Architecture Overview

ReDKG is a framework implementing strong AI algorithms for deep learning with reinforcement on dynamic knowledge graphs for decision support tasks. The figure below shows the general structure of the component. It includes four main modules:

Graph encoding modules into vector representations (encoder):
- KGE, implemented using the KGEModel class in redkg.models.kge
- GCN, implemented using the GCN class in redkg.models.gcn
- GAT, implemented using the GAT class in redkg.models.gat
- GraphSAGE, implemented using the GraphSAGE class in redkg.models.graphsage
State representation module (state representation), implemented using the GCNGRU class in redkg.models.gcn_gru_layers
Candidate object selection module (action selection)

Project Structure

The latest stable release of ReDKG is in the main branch

The repository includes the following directories:

Package redkg contains the main classes and scripts;
Package examples includes several how-to-use-cases where you can start to discover how ReDKG works;
Directory data shoul be contains data for modeling;
All unit and integration tests can be observed in the test directory;
The sources of the documentation are in the docs.

Cases and examples

To learn representations with default values of arguments from command line, use:

python kg_run

To learn representations in your own project, use:

from kge import KGEModel
from edge_predict import Evaluator
evaluator = Evaluator()

kge_model = KGEModel(
        model_name="TransE",
        nentity=info['nentity'],
        nrelation=info['nrelation'],
        hidden_dim=128,
        gamma=12.0,
        double_entity_embedding=True,
        double_relation_embedding=True,
        evaluator=evaluator
    )

Train KGQR model

To train KGQR model on your own data:

negative_sample_size = 128
nentity = len(entity_vocab.keys())
train_count = calc_state_kg(triples)

dataset = TrainDavaset (triples,
                        nentity,
                        len(relation_vocab.keys()),
                        negative_sample_size,
                        "mode",
                        train_count)

conf = Config()

#Building Net
model = GCNGRU(Config(), entity_vocab, relation_vocab, 50)

# Embedding pretrain by TransE
crain_kge_model (model_kge_model, train pars, info, triples, None)

#Training using RL
optimizer = optim.Adam(model.parameters(), lr=0.001)
train(Config(), item_vocab, model, optimizer)

Documentation

Detailed information and description of ReDKG framework is available in the Documentation

Contribution

To contribute this library, the current code and documentation convention should be followed. Project run linters and tests on each pull request, to install linters and testing-packages locally, run

pip install -r requirements-dev.txt

To avoid any unnecessary commits please fix any linting and testing errors after running of the each linter:

pflake8 .
black .
isort .
mypy stable_gnn
pytest tests

Contacts

Contact development team
Natural System Simulation Team https://itmo-nss-team.github.io/

Suported by

The study is supported by the Research Center Strong Artificial Intelligence in Industry of ITMO University as part of the plan of the center's program: Development and testing of an experimental sample of the library of algorithms of strong AI in terms of deep reinforcement learning on dynamic knowledge graphs for decision support tasks

Citation

@article{EGOROVA2022284,
title = {Customer transactional behaviour analysis through embedding interpretation},
author = {Elena Egorova and Gleb Glukhov and Egor Shikov},
journal = {Procedia Computer Science},
volume = {212},
pages = {284-294},
year = {2022},
doi = {https://doi.org/10.1016/j.procs.2022.11.012},
url = {https://www.sciencedirect.com/science/article/pii/S1877050922017033}
}

bda82 / redkg Goto Github PK

redkg's Introduction