Coder Social home page Coder Social logo

dist-kge's Introduction

Dist-KGE: A knowledge graph embedding library for multi-GPU and multi-machine training

This is the code and configuration accompanying the paper "Parallel Training of Knowledge Graph Embedding Models: A Comparison of Techniques". The code extends the knowledge graph embedding library LibKGE. For documentation on LibKGE refer to LibKGE repository. We provide the hyper-parameter settings for the experiments in their corresponding configuration files.

Table of contents

  1. Quick start
  2. Dataset preparation for parallel training
  3. Single Machine Multi-GPU Training
  4. Multi-GPU Multi-Machine Training
  5. Folder structure of experiment results
  6. Results and Configurations

Quick start

# retrieve and install project in development mode
git clone https://github.com/AdrianKs/dist-kge.git
cd dist-kge
pip install -e .

# download and preprocess datasets
cd data
sh download_all.sh
cd ..

# train an example model on toy dataset (you can omit '--job.device cpu' when you have a gpu)
kge start examples/toy-complex-train.yaml --job.device cpu

This example will train on a toy dataset in a sequential setup on CPU

Dataset preparation for parallel training

NOTE: Freebase already comes with multiple partition settings to save preprocessing time

To partition the data run the following commands (you only need to do this once)

Random Partitioning

For random partitioning no further preparation is needed.

Relation Partitioning

cd data
python partition_relation.py <dataset-name> -n <num-partitions>
cd ..

Stratification

cd data 
python partition_stratification.py <dataset-name> -n <num-partitions>
cd ..

Graph-Cut

cd data
python partition_graph_cut.py <dataset-name> -n <num-partitions>
cd ..

Single Machine Multi-GPU Training

Run following example to train on two GPUs with random partitioning (two worker per GPU):

python -m kge start examples/fb15k-complex-parallel.yaml

The most important configuration options for multi-gpu training are:

import:
  - complex
  - distributed_model
model: distributed_model
distributed_model:
  base_model: complex
job:
  distributed:
    num_partitions: 4
    num_workers: 4
    partition_type: random
    master_port: 8888  # change in case this port is used on your machine
  device_pool:
    - cuda:0
    - cuda:1
train:
  type: distributed_negative_sampling
  optimizer:
    default:
      type: dist_adagrad

Multi-GPU Multi-Machine Training

Parameter Server

For multi-machine training we use the parameter server Lapse. To install Lapse and the corresponding python bindings run the following commands:

git clone https://github.com/alexrenz/lapse-ps.git
cd lapse-ps
make ps KEY_TYPE=int64_t CXX11_ABI=$(python bindings/lookup_torch_abi.py) DEPS_PATH=$(pwd)/deps_bindings
cd bindings 
python setup.py install --user

For further documentation on the python bindings refer to Lapse-Binding documentation.

In case you can not use Lapse, we provide a very inefficient parameter server (for debugging). To use this debugging PS use the option --job.distributed.parameter_server torch

Interface

As we use the gloo backend to communicate between master and worker nodes you need to specify the interface connecting your machines and specify it as --job.distributed.gloo_socket_ifname. You can find out the names of your interfaces with the command

ip address

Example

Run the following example to train on two machines with one GPU each (1@2) with random partitioning:

Command for machine 1

python -m kge start examples/fb15k_complex_distributed.yaml --job.distributed.machine_id 0 --job.distributed.master_ip <ip_of_machine_0>

Command for machine 2

python -m kge start examples/fb15k_complex_distributed.yaml --job.distributed.machine_id 1 --job.distributed.master_ip <ip_of_machine_0>

Important options for distributed training in addition to the options specified in the single-machine setting are:

job:
  distributed:
    master_ip: '<ip_of_machine_0>'  # ip address of the machine with machine_id 0
    num_machines: 2
    num_workers_machine: 2
    gloo_socket_ifname: bond0  # name of the interface to use. Use command 'ip address' to find names
    parameter_server: lapse

Folder structure of experiment results

  • by default, each experiment will create a new folder in local/experiments/<timestamp>-<config-name>
  • this folder can be changed with command line argument --folder path/to/folder
  • for multi-machine training a folder is created for each machine. Therefore, specify a separate folder name for each machine if you work on a shared filesystem.
  • each worker will have its own subfolder logging partition-processing times
  • the complete epoch time over all partitions is logged in the main kge.log file
  • hardware information is logged into hardware_monitor.log and gpu_monitor.log
  • evaluation is performed on machine-0 by worker-0. Therefore, evaluation results are logged into folder <experiment-folder-machine-0>/worker-0/ in the files kge.log and trace.yaml

Results and Configurations

  • all ranking metrics are filtered with train, valid and test if not mentioned otherwise
  • configuration files for the experiments can be found here

Partitioning techniques (best-performing variant)

  • best performing variant in terms of time to 0.95 MRR reached in the sequential setting

FB15k

ComplEx

Setup Partitioning Technique Epoch Time Time to 0.95 MRR MRR MRR unfiltered Hits@1 Hits@10 Hits@100 config
. Sequential (GPU memory) 5.9s 3.9min 0.778 0.245 0.729 0.862 0.932 config
. Sequential (main memory) 7.7s 5.1min 0.778 0.245 0.729 0.862 0.932 config
2@1 Random (R) 2.6s 2.0min 0.775 0.243 0.726 0.859 0.931 config
1@2 Random (R) 2.9s 2.2min 0.775 0.243 0.726 0.859 0.931 config
4@2 Random (R) 1.3s 1.3min 0.766 0.241 0.712 0.858 0.929 config

RotatE

Setup Partitioning Technique Epoch Time Time to 0.95 MRR MRR MRR unfiltered Hits@1 Hits@10 Hits@100 config
. Sequential (GPU memory) 9.5s 11.9min 0.705 0.232 0.630 0.834 0.928 config
. Sequential (main memory) 11.4s 14.3min 0.705 0.232 0.630 0.834 0.928 config
2@1 Stratification (CARL) 4.6s 5.8min 0.725 0.239 0.664 0.835 0.926 config
1@2 Stratification (CARL) 5.9s 7.4min 0.725 0.239 0.664 0.835 0.926 config

Yago3-10

ComplEx

Setup Partitioning Technique Epoch Time Time to 0.95 MRR MRR MRR unfiltered Hits@1 Hits@10 Hits@100 config
. Sequential (GPU memory) 24.3s 35.5min 0.542 0.111 0.468 0.675 0.791 config
. Sequential (main memory) 42.6s 67.5min 0.542 0.111 0.468 0.675 0.791 config
2@1 Relation 19.0s 33.2min 0.538 0.107 0.465 0.669 0.787 config
1@2 Stratification (CRL) 11.3s 36.6min 0.531 0.110 0.457 0.665 0.784 config
4@2 Stratification (CRL) 7.8s n.r. 0.438 0.114 0.356 0.604 0.761 config

RotatE

Setup Partitioning Technique Epoch Time Time to 0.95 MRR MRR MRR unfiltered Hits@1 Hits@10 Hits@100 config
. Sequential (GPU memory) 74.1s 259.3min 0.451 0.104 0.343 0.637 0.773 config
. Sequential (main memory) 88.0s 307.8min 0.451 0.104 0.343 0.637 0.773 config
2@1 Stratification (CARL) 40.8s 166.6min 0.438 0.115 0.350 0.607 0.764 config
1@2 Stratification (CARL) 43.3s 175.8min 0.438 0.115 0.350 0.607 0.764 config

Wikidata5m

ComplEx

Setup Partitioning Technique Epoch Time Time to 0.95 MRR MRR MRR unfiltered Hits@1 Hits@10 Hits@100 config
. Sequential (GPU memory) 438.4s 219.0min 0.297 0.255 0.246 0.385 0.516 config
. Sequential (GPU memory) 774.3s 387.0min 0.297 0.255 0.246 0.386 0.516 config
2@1 Stratification (CARL) 232.8s 77.6min 0.308 0.264 0.255 0.398 0.513 config
1@2 Stratification (CARL) 228.0s 76.0min 0.308 0.264 0.255 0.398 0.513 config

RotatE

Setup Partitioning Technique Epoch Time Time to 0.95 MRR MRR MRR unfiltered Hits@1 Hits@10 Hits@100 config
. Sequential (GPU memory) 798.4s 199.6min 0.258 0.225 0.202 0.348 0.453 config
. Sequential (GPU memory) 985.7s 246.4min 0.258 0.225 0.202 0.348 0.453 config
2@1 Stratification (ARL) 466.7s 77.8min 0.264 0.230 0.213 0.344 0.410 config
1@2 Stratification (ARL) 477.7s 79.6min 0.264 0.230 0.213 0.344 0.410 config

Freebase

ComplEx

Setup Partitioning Technique Epoch Time Data sent per epoch sMRR sMRR unfiltered MRR MRR unfiltered Hits@1 Hits@10 Hits@100 config
. Sequential (main memory) 3929.0s - 0.811 0.776 0.364 0.311 0.298 0.487 0.618 config
. Sequential (B) (main memory) 3925.2s - 0.815 0.782 0.426 0.345 0.370 0.528 0.642 config
2@2 Random (RLB) 966.7s 232.8GB 0.816 0.782 0.426 0.352 0.371 0.529 0.639 config
2@2 Relation (rLB) 823.8s 205.9GB 0.801 0.770 0.397 0.326 0.339 0.507 0.631 config
2@2 Stratification (CARLB) 803.9s 123.2GB 0.793 0.761 0.325 0.285 0.272 0.424 0.563 config
2@2 Graph-cut (LB) 1170.6s 42.5GB 0.789 0.761 0.407 0.335 0.351 0.512 0.624 config
4@2 Random (RLB) 591.6s 251.9GB 0.819 0.784 0.421 0.346 0.364 0.523 0.638 config

RotatE

Setup Partitioning Technique Epoch Time Data sent per epoch sMRR sMRR unfiltered MRR MRR unfiltered Hits@1 Hits@10 Hits@100 config
. Sequential (main memory) 6495.7s - 0.774 0.748 0.566 0.426 0.529 0.627 0.677 config
. Sequential (B) (main memory) 6184.0s - 0.812 0.774 0.560 0.422 0.521 0.623 0.674 config
2@2 Random (RLB) 1541.4s 232.2GB 0.812 0.773 0.567 0.425 0.529 0.630 0.678 config
2@2 Relation (rLB) 1498.1s 207.2GB 0.791 0.758 0.551 0.407 0.515 0.608 0.656 config
2@2 Stratification (CARLB) 1416.1s 123.3GB 0.734 0.720 0.529 0.395 0.491 0.592 0.641 config
2@2 Graph-cut (LB) 1867.9s 44.5GB 0.775 0.749 0.560 0.410 0.526 0.616 0.654 config

Row-Adagrad

Row-wise optimizers treat each embedding as a single parameter instead of each dimension of an embedding and therefore reduce storage and communication overhead by about 50%. We observed no negative influence on the resulting embedding quality for all partitioning methods but graph-cut partitioning, where the drop was small but noticeable. Overall, we found Row-Adagrad to be a suitable approach to reduce storage and communication costs. We report on ComplEx, 1@2.

Yago3-10

Partition Technique Data sent (Adagrad) MRR (Adagrad) Data sent (Row-Adagrad) MRR (Row-Adagrad)
Sequential - 0.542 - 0.542
Random (R) 7.2GB 0.538 5.0GB 0.534
Relation 7.1GB 0.538 4.9GB 0.542
Stratification 0.4GB 0.531 0.2GB 0.539
Graph-cut 0.2GB 0.211 0.1GB 0.180

Wikidata5m

Partition Technique Data sent (Adagrad) MRR (Adagrad) Data sent (Row-Adagrad) MRR (Row-Adagrad)
Sequential - 0.297 - 0.291
Random (R) 125.2GB 0.296 65.7GB 0.298
Relation 123.8GB 0.296 63.5GB 0.300
Stratification 15.0GB 0.308 7.4GB 0.306
Graph-cut 6.1GB 0.192 3.7GB 0.181

dist-kge's People

Contributors

adrianks avatar cthoyt avatar mayo42 avatar nluedema avatar nzteb avatar rgemulla avatar rufex2001 avatar sailera19 avatar samuelbroscheit avatar sanxing-chen avatar sfschouten avatar tillgeissler avatar tsafavi avatar unmeshvrije avatar vonvogelstein avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.