Drug-Target Indication Prediction by Integrating End-to-End Learning and Fingerprints

Computer-Aided Drug Discovery research has proven to be a promising direction in drug discovery. In recent years, Deep Learning approaches have been applied to problems in the domain such as Drug-Target Indication Prediction and have shown improvements over traditional screening methods.

An existing challenge is how to represent compound-target pairs in deep learning models. While several representation methods exist, such descriptor schemes tend to complement one another in many instances, as reported in the literature. In this project, we propose a multi-view architecture trained adversarially to leverage this complementary behavior for DTI prediction by integrating both differentiable and predefined molecular descriptors (fingerprints). Our results on empirical data demonstrate that our approach, generally, results in improved model accuracy.

This repository contains the accompanying codes and other ancillary files of the aforementioned study.

Requirements

Project/Module	Version
Pytorch	>=1.1.0
Numpy	>=1.15
DeepChem	>= 2.2.0
Padme	See the PADME project
Pandas	>= 0.25.0
Seaborn	0.9.0
Soek	See the Soek project
torch-scatter	>= 1.3.1
tqdm	>= 4.x

Note: The dcCustom package of the PADME project has been refactored to have the new package name padme in this project and should not be misunderstood with any other module which may be bearing the same name. We took this decision in order to enhance clarity in our work by calling it the name given to it by its authors.

Usage

The bash files found here are used for model training and evaluation of the baseline and the IVPGAN models. The bash files with the padme_ prefix train the baseline models reflected in their name. For instance, padme_cold_drug_gconv_cv_kiba trains our implementation of the GraphConv-PSC model using k-fold Cross-Validation with a cold drug splitting scheme on the KIBA dataset. The IVPGAN models are trained using the bash files with the integrated_ prefix. They also follow the same naming pattern as the padme_ files.

The bash file with _eval_ in their names are used for evaluating a trained model. We use a resource tree structure to aggregate all training and evaluation statistics which are then saved as JSON files for later analysis. For more on the resource tree structure, you can examine sim_data.py and its usage in singleview.py and train_joint_gan.py. The performance data saved in a JSON file of each evaluated model is analysed using worker.py. The data that generates the reported results can be found here.

Results

Quantitative results

RMSE
Dataset	CV split type	ECFP8	GraphConv	IVPGAN
Davis	Warm	0.2216 ± 0.082	0.3537 ± 0.053	0.2014± 0.043
	Cold drug	0.3978 ± 0.105	0.4751 ± 0.123	0.2895 ± 0.163
	Cold target	0.5517 ± 0.088	0.5752 ± 0.101	0.2202± 0.139
Metz	Warm	0.3321± 0.057	0.5537 ± 0.033	0.5529 ± 0.033
	Cold drug	0.3778± 0.097	0.5711± 0.057	0.5477 ± 0.064
	Cold target	0.6998 ± 0.065	0.7398 ± 0.047	0.5745 ± 0.054
KIBA	Warm	0.4350 ± 0.086	0.5604 ± 0.120	0.4003 0.082
	Cold drug	0.4502 ± 0.128	0.552 ± 0.156	0.4690 ± 0.132
	Cold target	0.6645 ± 0.137	0.7555 ± 0.153	0.4486± 0.106

Concordance Index
Dataset	CV split type	ECFP8	GraphConv	IVPGAN
Davis	Warm	0.9647 ± 0.020	0.9335 ± 0.011	0.9729± 0.008
	Cold drug	0.9099 ± 0.049	0.8784 ± 0.052	0.9493 ± 0.044
	Cold target	0.8683 ± 0.033	0.8480 ± 0.038	0.9631± 0.036
Metz	Warm	0.8983± 0.0.033	0.7968 ± 0.027	0.7913 ± 0.029
	Cold drug	0.8730± 0.044	0.7850± 0.040	0.7894 ± 0.042
	Cold target	0.7304 ± 0.039	0.7084 ± 0.041	0.7776 ± 0.038
KIBA	Warm	0.8322 ± 0.024	0.7873 ± 0.029	0.8433 0.023
	Cold drug	0.8132 ± 0.047	0.7736 ± 0.048	0.8070 ± 0.051
	Cold target	0.7185 ± 0.044	0.6661 ± 0.052	0.8234± 0.044

R²
Dataset	CV split type	ECFP8	GraphConv	IVPGAN
Davis	Warm	0.9252 ± 0.061	0.8254 ± 0.039	0.9449± 0.021
	Cold drug	0.7573 ± 0.171	0.6773 ± 0.159	0.8635 ± 0.151
	Cold target	0.5916 ± 0.120	0.5423 ± 0.121	0.9059± 0.121
Metz	Warm	0.8637± 0.057	0.6279 ± 0.075	0.6285 ± 0.078
	Cold drug	0.8124± 0.117	0.5860± 0.120	0.6166 ± 0.120
	Cold target	0.4259 ± 0.121	0.3619 ± 0.112	0.5931 ± 0.106
KIBA	Warm	0.7212 ± 0.072	0.5513 ± 0.097	0.7658 0.065
	Cold drug	0.6677 ± 0.137	0.5026 ± 0.152	0.6475 ± 0.142
	Cold target	0.3648 ± 0.128	0.1910 ± 0.088	0.7056± 0.113

Qualitative results

First two charts are for ECFP-PSC
Second two charts are for GraphConv-PSC
Last two charts are for IVPGAN

Davis

Warm split

Cold drug split

Cold target split