OpenKE

An Open-source Framework for Knowledge Embedding.

More information is available on our website http://openke.thunlp.org/

Overview

This is an Efficient implementation based on TensorFlow for knowledge representation learning (KRL). We use C++ to implement some underlying operations such as data preprocessing and negative sampling. For each specific model, it is implemented by TensorFlow with Python interfaces so that there is a convenient platform to run models on GPUs. OpenKE composes 3 repositories:

OpenKE: the main project based on TensorFlow, which provides the optimized and stable framework for knowledge graph embedding models.

TensorFlow-TransX: light and simple version of OpenKE based on TensorFlow, including TransE, TransH, TransR and TransD.

Fast-TransX: efficient lightweight C++ inferences for TransE and its extended models utilizing the framework of OpenKE, including TransH, TransR, TransD, TranSparse and PTransE.

Installation

Install TensorFlow
Clone the OpenKE repository:

$ git clone https://github.com/thunlp/OpenKE

$ cd OpenKE
Compile C++ files

$ bash make.sh

Data

Datasets are required in the following format, containing at least three files for training:

triple2id.txt: training file, the first line is the number of triples for training. Then the follow lines are all in the format (e1, e2, rel).

entity2id.txt: all entities and corresponding ids, one per line. The first line is the number of entities.

relation2id.txt: all relations and corresponding ids, one per line. The first line is the number of relations.

Quickstart

To compute a knowledge graph embedding, first import datasets and set configure parameters for training, then train models and export results. For instance, we write a example.py to train TransE:

import config
import models
import tensorflow as tf

con = config.Config()

con.set_in_path("benchmarks/FB15K/")
con.set_out_path("benchmarks/FB15K/")
con.set_work_threads(1)

con.set_export_files("res/model.vec")
con.set_export_steps(10)

con.set_train_times(500)
con.set_nbatches(100)
con.set_alpha(0.5)
con.set_bern(0)
con.set_dimension(200)
con.set_margin(1)
con.set_ent_neg_rate(1)
con.set_rel_neg_rate(0)
con.set_optimizer("SGD")

con.init()
con.set_model(models.TransE)
con.run()

Step 1: Import datasets

con.set_in_path("benchmarks/FB15K/")
con.set_out_path("benchmarks/FB15K/")

We import knowledge graphs from benchmarks/FB15K/ folder. The data consists of three essential files mentioned before:

triple2id.txt
entity2id.txt
relation2id.txt

Validation and test files are required and used to evaluate the training results, However, they are not indispensable for training.

con.set_work_threads(1)

We can allocate several threads to sample positive and negative cases.

Step 2: Set configure parameters for training.

con.set_train_times(500)
con.set_nbatches(100)
con.set_alpha(0.5)
con.set_dimension(200)
con.set_margin(1)

We set essential parameters, including the data traversing rounds, learning rate, batch size, and dimensions of entity and relation embeddings.

con.set_bern(0)
con.set_ent_neg_rate(1)
con.set_rel_neg_rate(0)

For negative sampling, we can corrupt entities and relations to construct negative triples. set_bern(0) will use the traditional sampling method, and set_bern(1) will use the method in (Wang et al. 2014) denoted as "bern".

con.set_optimizer("SGD")

We can select a proper gradient descent optimization algorithm to train models.

Step 3: Train models

con.init()
con.set_model(models.TransE)
con.run()

positive We set the knowledge graph embedding model and start the training process.

Step 4: Export results

con.set_export_files("res/model.vec")

con.set_export_steps(10)

The results will be automatically exported to the given files every few rounds.

Interfaces

Config

class Config(object):
		
	#To set the learning rate
	def set_alpha(alpha = 0.001)
	
	#To set the degree of the regularization on the parameters
	def set_lmbda(lmbda = 0.0)
	
	#To set the gradient descent optimization algorithm (SGD, Adagrad, Adadelta, Adam)
	def set_optimizer(optimizer = "SGD")
	
	#To set the data traversing rounds
	def set_train_times(self, times)
	
	#To split the training triples into several batches, nbatches is the number of batches
	def set_nbatches(nbatches = 100)
	
	#To set the margin for the loss function
	def set_margin(margin = 1.0)
	
	#To set the dimensions of the entities and relations at the same time
	def set_dimension(dim)
	
	#To set the dimensions of the entities
	def set_ent_dimension(self, dim)
	
	#To set the dimensions of the relations
	def set_rel_dimension(self, dim)
	
	#To set the input folder		
	def set_in_path(path = "./")
	
	#To set the output folder
	def set_out_path(path = "./")
	
	#To allocate threads for each batch sampling
	def set_work_threads(threads = 1)
	
	#To set negative sampling algorithms, unif (bern = 0) or bern (bern = 1)
	def set_bern(bern = 1)
	
	#For each positive triple, we construct rate negative triples by corrupt the entity
	def set_ent_neg_rate(rate = 1)
	
	#For each positive triple, we construct rate negative triples by corrupt the relation
	def set_rel_neg_rate(rate = 0)
	
	#To sample a batch of training triples, including positive and negative ones.
	def sampling()
	
	#To set the import files, all parameters can be restored from the import files
	def set_import_files(path = None)
	
	#To set the export files
	def set_export_files(path = None)
	
	#To export results every several rounds
	def set_export_steps(steps = 1)
	
	#To set the knowledge embedding model
	set_model(model)
	
	#The framework will print loss values during training if flag = 1
	def set_log_on(flag = 1)

Model

class Model(object)

	# return config which saves the training parameters.
	get_config(self)
	
	# in_batch = True, return [positive_head, positive_tail, positive_relation]
	# The shape of positive_head is [batch_size, 1]
	# in_batch = False, return [positive_head, positive_tail, positive_relation]
	# The shape of positive_head is [batch_size]
	get_positive_instance(in_batch = True)
	
	# in_batch = True, return [negative_head, negative_tail, negative_relation]
	# The shape of positive_head is [batch_size, negative_ent_rate + negative_rel_rate]
	# in_batch = False, return [negative_head, negative_tail, negative_relation]
	# The shape of positive_head is [(negative_ent_rate + negative_rel_rate) * batch_size]		
	get_negative_instance(in_batch = True)

	# in_batch = True, return all training instances with the shape [batch_size, (1 + negative_ent_rate + negative_rel_rate)]
	# in_batch = False, return all training instances with the shape [(negative_ent_rate + negative_rel_rate + 1) * batch_size]
	def get_all_instance(in_batch = False)

	# in_batch = True, return all training labels with the shape [batch_size, (1 + negative_ent_rate + negative_rel_rate)]
	# in_batch = False, return all training labels with the shape [(negative_ent_rate + negative_rel_rate + 1) * batch_size]
	# The positive triples are labeled as 1, and the negative triples are labeled as -1
	def get_all_labels(in_batch = False)
	
	# To define containers for training triples
	def input_def()
	
	# To define embedding parameters for knowledge embedding models
	def embedding_def()

	# To define loss functions for knowledge embedding models
	def loss_def()
	
	def __init__(config)

#The implementation for TransE
class TransE(Model)

#The implementation for TransH	
class TransH(Model)

#The implementation for TransR
class TransR(Model)

#The implementation for TransD
class TransD(Model)

#The implementation for RESCAL
class RESCAL(Model)

#The implementation for DistMult
class DistMult(Model)

#The implementation for HolE
class HolE(Model)					

#The implementation for ComplEx
class ComplEx(Model)

0xqq / openke Goto Github PK

openke's Introduction

OpenKE

Overview

Installation

Data

Quickstart

Step 1: Import datasets

Step 2: Set configure parameters for training.

Step 3: Train models

Step 4: Export results

Interfaces

Config

Model

openke's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent