Coder Social home page Coder Social logo

dppi's Introduction

DPPI

A convolutional neural network to predict PPI interactions.

Main Command: th main.lua -dataset myTrain -learningRate 0.01 -momentum 0.9 -string first-run -device 1 -top_rand -batchSize 2 -saveModel

==> Input parameters:

    -dataset: Name of the training data (e.g. myTrain)

    -string: A suffix that is added to the result file
    
    -device: GPU number

==> Necessary input files before running the command:

    -Training data: It is in dat format. 
     The name of this file should be the name of your training data followed by ‘_labels’ (e.g myTrain_labels.dat).

     The dat file is made using a script called 'convert_csv_to_dat.lua' . 

    -Validation data: Same as Training data. Name of this file is the same as training data followed by '_valid'
     (e.g myTrain_valid_labels.dat). 

    Similar to Training data, you can make the dat file using convert_csv_to_dat.lua

    -Cropped profiles of proteins: It is in t7 format. This file is made using a script called ‘create_crop.lua’.

    -Numbers of cropped per profile: It is in t7 format. This file is made using a script called ‘create_crop.lua’.

====================================================

convert_csv_to_dat.lua: This script converts a csv file to dat file.

Command:

th convert_csv_to_dat.lua -dataset myTrain th convert_csv_to_dat.lua -dataset myTrain_valid

==> Input parameters:

    -dataset: name of the dataset in csv format without suffix (e.g. myTrain).
    
    This file contains three column where first and second columns are two proteins and third column is either 1 or 0
    
    indicating if the two proteins interact or not (e.g. myTrain.csv and myTrain_valid.csv). 

==> Output:

    dataset in dat format (e.g. myTrain_labels.dat or myTrain_valid_labels.dat) 

====================================================

creat_crop.lua: This scripts makes the cropped profiles

Command:

th creat_crop.lua -dataset myTrain

==> Input parameters:

    -dataset: name of your training data (e.g. ‘myTrain’).

==> Output:

    1) cropped profiles: It is in t7 format. The name of this file is the input name followed by “_number_crop_512” (e.g. myTrain_profile_crop_512.t7)

    2) numbers of cropped per profile: It is in t7 format. The name of this file is the input name followed by “_number_crop_512” (e.g. myTrain_number_crop_512.t7) 

==> Necessary input files before running the command:

    -You should have a 1)file and a 2)folder with names the same as the -dataset:
    
    1)The suffix of the file is ‘.node’ (e.g myTrain.node). This file has one column which contains names of all proteins in the training and validation data. 

    2) Profile folder with the same name as -dataset (e.g myTrain). This folder contains profiles of all proteins in training and validation data. 
    
    The name of the profiles inside this folder is the same as the protein names in ‘.node’ file 

====================================================

Please remember befor running the Main Command you need to change the data directory and work directory

in main.lua file at lines 5 and 6. You need to replace '$HOME' with your own data directory and work directory.

dppi's People

Contributors

hashemifar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dppi's Issues

Segment fault

Hello, I am running your code :th main.lua -dataset myTrain -learningRate 0.01 -momentum 0.9 -string first-run -device 1 -top_rand -batchSize 2 -saveModel (modified the address in main.lua and used your data) and a segmentation error has occurred. I haven't solved this problem for a long time after debugging. The main mistakes are these three:
[1][475221.681741] python[35692]: segfault at 20 ip 00007ff51a25a760 sp 00007ffe58687d00 error 4 in python3.7[7ff51a179000+1e1000];
[2][544549.003939] a.out[24933]: segfault at 6023e0 ip 0000000000401600 sp 00007ffd7fad55a0 error 7 in a.out[400000+3000];
[3][992757.284087] luajit[50534]: segfault at 18 ip 00000000004687d3 sp 00007ffd37779388 error 4 in luajit[400000+99000].I want to know if there is a problem with my environment or the way to run the code.Thank you!

example file for training set

It is not immediately clear what does the README file mean by "Training data: It is in dat format.Training data contains three column where first and second columns are two proteins". Similarly, it is unclear what does it mean by "Cropped profiles of proteins: It is in t7 format. This file is made using a script called ‘create_crop.lua’." Is it possible to provide small example files for training data and the expected output?

PSSM profiles

Hi,
I am trying to recreate your results. I have a query:
are the PSSM profiles provided in myTrain folder real or just toy examples?
Because the dimensions don't match for protein Q08999 for example.
Thanks!
Regards,
Saby

from protein sequence to training data

Hi, We would like to try out your code to predict PPI; however, we are having trouble understanding the input format. Given two lists of protein sequences (positive and negative sets), how do we convert the primary protein sequences to the format you have under the folder myTrain/? Thanks!

Average Pooling to encode protein profile

Hi, in the paper, the protein profiles P are converted to vectors as
o = Pool ( Relu ( Batch( Conv ( P ) ) )

In the Supplement, it seems you are using Average Pooling with some window size l_p. In the code, it seems that l_p is size 4. Then you have to "flatten" all the ave pool vectors. How can this produce the final vector of size 1xd, where d is the number of filter, as in the Supplement?

Code Request

Hello, I am a graduate student studying in a related field, and your model algorithm has given me a great inspiration. If possible, could you take a look at the source code?

Unclear how invariance to input profiles is achieved

From the paper,
o1 = ReLU Batch [W1 W2] R1
o2 = ReLU Batch [W2 W1] R2

If I reverse(_r) the order of R1 and R2, I get,
o2_r = ReLU Batch [W1 W2] R2
o1_r = ReLU Batch [W2 W1] R1

The Hamard product q = o1 . o2 doesn't appear to be the same as q_r = o2_r . o1_r

Could you please explain how the network is invariant to the order in which the protein sequences within each pair are input?

Saby

Main Command: th main.lua -dataset myTrain -learningRate 0.01 -momentum 0.9 -string first-run -device 1 -top_rand -batchSize 2 -saveModel

I get an error when using this argument: -top_rand.

This argument is not being accepted by the main.lua program.

Please let me know what is the significance of this argument?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.