Coder Social home page Coder Social logo

redkg's Introduction

ReDKG

SAI ITMO Documentation Linters Tests Mirror

Reinforcement learning on Dynamic Knowledge Graphs (ReDKG) is a toolkit for deep reinforcement learning on dynamic knowledge graphs. It is designed to encode static and dynamic knowledge graphs (KG) by constructing vector representations for the entities and relationships. The reinforcement learning algorithm based on vector representations is designed to train recommendation models or models of decision support systems based on reinforcement learning (RL) using vector representations of graphs.

Installation

Python >= 3.9 is required

As a first step, Pytorch Geometric installation and Torch 1.1.2 are required.

PyTorch 1.12

# CUDA 10.2
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch
# CUDA 11.3
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
# CUDA 11.6
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
# CPU Only
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cpuonly -c pytorch

When Torch installed clone this repo and run inside repo directory:

pip install . 

Download test data

Download ratings.csv to /data/ folder./ Data folder should contain the following files:

  • ratings.csv - raw rating file;
  • attributes.csv - raw attributes file;
  • kg.txt - knowledge graph file;
  • item_index2enity_id.txt - the mapping from item indices in the raw rating file to entity IDs in the KG file;

Example of Using KGE Models

Preprocess the data

from redkg.config import Config
from redkg.preprocess import DataPreprocessor

config = Config()
preprocessor = DataPreprocessor(config)
preprocessor.process_data()

Train KG model

kge_model = KGEModel(
    model_name="TransE",
    nentity=info['nentity'],
    nrelation=info['nrelation'],
    hidden_dim=128,
    gamma=12.0,
    double_entity_embedding=True,
    double_relation_embedding=True,
    evaluator=evaluator
)

training_logs, test_logs = train_kge_model(kge_model, train_pars, info, train_triples, valid_triples)

Example of Using GCN, GAT, GraphSAGE Models

These models implement an algorithm for predicting links in a knowledge graph.

Additional information about training steps can be found in basic_link_prediction.ipynb example.

Loading Test Data

The test dataset can be obtained from the link jd_data2.json and placed in the /data/ directory.

Data Preprocessing

For preprocessing, it is necessary to read data from the file and convert it into PyTorch Geometric format.

import json
import torch
from torch_geometric.data import Data

# Read data from the file
with open('jd_data2.json', 'r') as f:
    graph_data = json.load(f)

# Extract the list of nodes and convert it to a dictionary for quick lookup
node_list = [node['id'] for node in graph_data['nodes']]
node_mapping = {node_id: i for i, node_id in enumerate(node_list)}
node_index = {index: node for node, index in node_mapping.items()}

# Create a list of edges in PyTorch Geometric format
edge_index = [[node_mapping[link['source']], node_mapping[link['target']]] for link in graph_data['links']]
edge_index = torch.tensor(edge_index, dtype=torch.long).t().contiguous()
features = torch.randn(len(node_list), 1)
labels = torch.tensor(list(range(len(graph_data['nodes']))), dtype=torch.long)

large_dataset = Data(x=features, edge_index=edge_index, y=labels, node_mapping=node_mapping, node_index=node_index)
torch.save(large_dataset, 'large_dataset.pth')
large_dataset.cuda()

Next, it is necessary to generate subgraphs for training the model. This can be done using the following code:

import json
import os
from redkg.generate_subgraphs import generate_subgraphs

# Generate a dataset of 1000 subgraphs, each containing between 3 and 15 nodes
if not os.path.isfile('subgraphs.json'):
    subgraphs = generate_subgraphs(graph_data, num_subgraphs=1000, min_nodes=3, max_nodes=15)
    with open('subgraphs.json', 'w') as f:
        json.dump(subgraphs, f)
else:
    with open('subgraphs.json', 'r') as f:
        subgraphs = json.load(f)

Next, convert the subgraphs into PyTorch Geometric format:

from redkg.generate_subgraphs import generate_subgraphs_dataset

dataset = generate_subgraphs_dataset(subgraphs, large_dataset)

Model Training

Let's initialize the optimizer and the model in training mode:

from redkg.models.graphsage import GraphSAGE
from torch.optim import Adam

# Train the GraphSAGE model (GCN or GAT can also be used)
#   number of input and output features matches the number of nodes in the large graph - 177
#   number of layers - 64
model = GraphSAGE(large_dataset.num_node_features, 64, large_dataset.num_node_features)
model.train()

# Use the Adam optimizer
#   learning rate - 0.0001
#   weight decay - 1e-5
optimizer = Adam(model.parameters(), lr=0.0001, weight_decay=1e-5)

Start training the model for 2 epochs:

from redkg.train import train_gnn_model
from redkg.negative_samples import generate_negative_samples

# Model training
loss_values = []
for epoch in range(2):
    for subgraph in dataset:
        positive_edges = subgraph.edge_index.t().tolist()
        negative_edges = generate_negative_samples(subgraph.edge_index, subgraph.num_nodes, len(positive_edges))
        if len(negative_edges) == 0:
            continue
        loss = train_gnn_model(model, optimizer, subgraph, positive_edges, negative_edges)
        loss_values.append(loss)
        print(f"Epoch: {epoch}, Loss: {loss}")

Architecture Overview

ReDKG is a framework implementing strong AI algorithms for deep learning with reinforcement on dynamic knowledge graphs for decision support tasks. The figure below shows the general structure of the component. It includes four main modules:

  • Graph encoding modules into vector representations (encoder):
    • KGE, implemented using the KGEModel class in redkg.models.kge
    • GCN, implemented using the GCN class in redkg.models.gcn
    • GAT, implemented using the GAT class in redkg.models.gat
    • GraphSAGE, implemented using the GraphSAGE class in redkg.models.graphsage
  • State representation module (state representation), implemented using the GCNGRU class in redkg.models.gcn_gru_layers
  • Candidate object selection module (action selection)

Project Structure

The latest stable release of ReDKG is in the main branch

The repository includes the following directories:

  • Package redkg contains the main classes and scripts;
  • Package examples includes several how-to-use-cases where you can start to discover how ReDKG works;
  • Directory data shoul be contains data for modeling;
  • All unit and integration tests can be observed in the test directory;
  • The sources of the documentation are in the docs.

Cases and examples

To learn representations with default values of arguments from command line, use:

python kg_run

To learn representations in your own project, use:

from kge import KGEModel
from edge_predict import Evaluator
evaluator = Evaluator()

kge_model = KGEModel(
        model_name="TransE",
        nentity=info['nentity'],
        nrelation=info['nrelation'],
        hidden_dim=128,
        gamma=12.0,
        double_entity_embedding=True,
        double_relation_embedding=True,
        evaluator=evaluator
    )

Train KGQR model

To train KGQR model on your own data:

negative_sample_size = 128
nentity = len(entity_vocab.keys())
train_count = calc_state_kg(triples)

dataset = TrainDavaset (triples,
                        nentity,
                        len(relation_vocab.keys()),
                        negative_sample_size,
                        "mode",
                        train_count)

conf = Config()

#Building Net
model = GCNGRU(Config(), entity_vocab, relation_vocab, 50)

# Embedding pretrain by TransE
crain_kge_model (model_kge_model, train pars, info, triples, None)

#Training using RL
optimizer = optim.Adam(model.parameters(), lr=0.001)
train(Config(), item_vocab, model, optimizer)

Documentation

Detailed information and description of ReDKG framework is available in the Documentation

Contribution

To contribute this library, the current code and documentation convention should be followed. Project run linters and tests on each pull request, to install linters and testing-packages locally, run

pip install -r requirements-dev.txt

To avoid any unnecessary commits please fix any linting and testing errors after running of the each linter:

  • pflake8 .
  • black .
  • isort .
  • mypy stable_gnn
  • pytest tests

Contacts

Suported by

The study is supported by the Research Center Strong Artificial Intelligence in Industry of ITMO University as part of the plan of the center's program: Development and testing of an experimental sample of the library of algorithms of strong AI in terms of deep reinforcement learning on dynamic knowledge graphs for decision support tasks

Citation

@article{EGOROVA2022284,
title = {Customer transactional behaviour analysis through embedding interpretation},
author = {Elena Egorova and Gleb Glukhov and Egor Shikov},
journal = {Procedia Computer Science},
volume = {212},
pages = {284-294},
year = {2022},
doi = {https://doi.org/10.1016/j.procs.2022.11.012},
url = {https://www.sciencedirect.com/science/article/pii/S1877050922017033}
}

redkg's People

Contributors

shikovegor avatar bda82 avatar kryksh avatar evilfreelancer avatar mangaboba avatar danwhale avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.