Coder Social home page Coder Social logo

zhuchaoy / rdkg-115 Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 0.0 262.36 MB

RDKG-115: Assisting drug repurposing and discovery for rare diseases by trimodal knowledge graph embedding

License: Apache License 2.0

Python 100.00%
drug-repurposing knowledge-graph rare-disease

rdkg-115's Introduction

RDKG-115

RDKG-115: Assisting drug repurposing and discovery for rare diseases by trimodal knowledge graph embedding

Cite By

Chaoyu Zhu, Xiaoqiong Xia, Nan Li, Fan Zhong, Zhihao Yang, Lei Liu. RDKG-115: Assisting drug repurposing and discovery for rare diseases by trimodal knowledge graph embedding. Computers in Biology and Medicine, 2023, 107262.

Abstract

Rare diseases (RDs) may affect individuals in small numbers, but they have a significant impact on a global scale. Accurate diagnosis of RDs is challenging, and there is a severe lack of drugs available for treatment. Pharmaceutical companies have shown a preference for drug repurposing from existing drugs developed for other diseases due to the high investment, high risk, and long cycle involved in RD drug development. Compared to traditional approaches, knowledge graph embedding (KGE) based methods are more efficient and convenient, as they treat drug repurposing as a link prediction task. KGE models allow for the enrichment of existing knowledge by incorporating multimodal information from various sources. In this study, we constructed RDKG-115, a rare disease knowledge graph involving 115 RDs, composed of 35,643 entities, 25 relations, and 5,539,839 refined triplets, based on 372,384 high-quality literature and 4 biomedical datasets: DRKG, Pathway Commons, PharmKG, and PMapp. Subsequently, we developed a trimodal KGE model containing structure, category, and description embeddings using reverse-hyperplane projection. We utilized this model to infer 4,199 reliable new inferred triplets from RDKG-115. Finally, we calculated potential drugs and small molecules for each of the 115 RDs, taking multiple sclerosis as a case study. This study provides a paradigm for large-scale screening of drug repurposing and discovery for RDs, which will speed up the drug development process and ultimately benefit patients with RDs.

Files

Annotation/

E_dict.json : json file of standard entity set

Model/

Models.py : TransE, TransH, ConvKB, RotatE structure
run_kge.py : Running and configuration file
train_kge.py : Class of processing and tool functions for Knowledge Graph Embedding

C/

C_dict.data : Dict of entity category annotation
get_c_dict.py : Run it to get C_dict.data

D/

BERT.py : Structure of BERT
DTModel.py : Structure for training description table
optimization.py : Training optimization of BERT
run_d_Table.py : Run it to get D_table.data
tokenization.py : Tokenization function of BERT
train_d_Table.py : Class of processing and tool functions for D_table

KG/

entity.csv : Entities of RDKG-115
relation.csv : Relations of RDKG-115
train.csv : Train set of RDKG-115 (98%)
dev.csv : Dev set of RDKG-115 (1%)
test.csv : Test set of RDKG-115 (1%)

Pretrained BERT/

RDKG-115/

entity_name_map.xlsx : Entity key-name mapping file
RDKG-115.csv : Raw RDKG-115
RDKG-115 (plus inferred).csv : RDKG-115 merge with reliable new inferred knowledges

Reference

(1) TransE: Translating Embeddings for Modeling Multi-relational Data
(2) TransH: Knowledge Graph Embedding by Translating on Hyperplanes
(3) ConvKB: A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network.
(4) RotatE: ROTATE: KNOWLEDGE GRAPH EMBEDDING BY RELATIONAL ROTATION IN COMPLEX SPACE
(5) BERT: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Code: https://github.com/google-research/bert)
(6) BioBERT: BioBERT: a pre-trained biomedical language representation model for biomedical text mining
(7) PubMedBERT: Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
(8) SCIBERT: SCIBERT: A Pretrained Language Model for Scientific Text

Version

(1) python 3.6.2
(2) torch 1.9.1+cu111
(3) numpy 1.19.2

Operating Instructions

(1) Run get_c_dict.py to get C_dict.data in Model/C/ (Already in the folder, you can not run)

python get_c_dict.py   

(2) Run run_d_table.py to get D_table.data in Model/D/

python run_d_table.py --model pubmedbert --len_d 150 --l_r 1e-5 --batch_size 8 --epoches 5 --earlystop 1   

(3) Run run_kge.py to train TransE, TransH, ConvKB and RotatE in Model/

4 Configurations Interpretation

lanta_c == 0 and lanta_d == 0 : S
lanta_c != 0 and lanta_d == 0 : S + C
lanta_c == 0 and lanta_d != 0 : S + D
lanta_c != 0 and lanta_d != 0 : S + C + D

Recommended Parameters for 4 models and 4 Configurations (dim of 128)

TransE

python run_kge.py --model TransE --margin 1.0 --lanta_c 0.0 --lanta_d 0.0 --l_r 5e-3 --epoches 800
python run_kge.py --model TransE --margin 1.0 --lanta_c 0.5 --lanta_d 0.0 --l_r 5e-4 --epoches 200
python run_kge.py --model TransE --margin 1.0 --lanta_c 0.0 --lanta_d 0.5 --l_r 5e-4 --epoches 200
python run_kge.py --model TransE --margin 1.0 --lanta_c 0.1 --lanta_d 0.5 --l_r 5e-4 --epoches 200

TransH

python run_kge.py --model TransH --margin 1.0 --lanta_c 0.0 --lanta_d 0.0 --l_r 5e-3 --epoches 800
python run_kge.py --model TransH --margin 1.0 --lanta_c 0.3 --lanta_d 0.0 --l_r 5e-4 --epoches 200
python run_kge.py --model TransH --margin 1.0 --lanta_c 0.0 --lanta_d 0.5 --l_r 5e-4 --epoches 200
python run_kge.py --model TransH --margin 1.0 --lanta_c 0.3 --lanta_d 0.5 --l_r 5e-4 --epoches 200

ConvKB

python run_kge.py --model ConvKB --margin 2.0 --lanta_c 0.0 --lanta_d 0.0 --l_r 5e-3 --epoches 800
python run_kge.py --model ConvKB --margin 2.0 --lanta_c 0.3 --lanta_d 0.0 --l_r 5e-4 --epoches 200
python run_kge.py --model ConvKB --margin 2.0 --lanta_c 0.0 --lanta_d 0.1 --l_r 5e-4 --epoches 200
python run_kge.py --model ConvKB --margin 2.0 --lanta_c 0.3 --lanta_d 0.5 --l_r 5e-4 --epoches 200

RotatE

python run_kge.py --model RotatE --margin 3.0 --lanta_c 0.0 --lanta_d 0.0 --l_r 5e-3 --epoches 800
python run_kge.py --model RotatE --margin 3.0 --lanta_c 0.3 --lanta_d 0.0 --l_r 5e-4 --epoches 200
python run_kge.py --model RotatE --margin 3.0 --lanta_c 0.0 --lanta_d 0.3 --l_r 5e-4 --epoches 200
python run_kge.py --model RotatE --margin 3.0 --lanta_c 0.1 --lanta_d 0.5 --l_r 5e-4 --epoches 200

rdkg-115's People

Contributors

zhuchaoy avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rdkg-115's Issues

RDKG-115 training hardware

Hi,
I want to know if it'd be feasible to train the model as you describe with a single GPU in a reasonable time (a few hours). I've been reading about other KGE models which require a very powerful hardware, but I don't know how it is with this model.
thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.