Light

skearnes / rxnpredict Goto Github PK

View Code? Open in Web Editor NEW

This project forked from doylelab/rxnpredict

0.0 0.0 0.0 727.7 MB

Predicting reaction performance using machine learning

License: MIT License

R 66.57% Python 33.43%

rxnpredict's Introduction

Instructions for Using `rxnpredict`

Download and install the following programs:
- Spartan ’14 V1.1.4
- Python 3 (The anaconda distribution is recommended, as it has packages required for the software to run: Download at https://www.anaconda.com/download/)
- R (Download at https://cran.r-project.org/mirrors.html – choose any mirror link)
- R Studio (Download at https://www.rstudio.com/products/rstudio/download/)
- Sublime Text 3 (Download at https://www.sublimetext.com/3)
Add Anaconda as a PATH variable so that Python will execute the scripts within Sublime Text 3.
- Navigate to “This PC” in File Explorer
- Right click "This PC" → Properties → Advanced system settings → Environment Variables...
- In User variables, click "Path" variable → Edit → New → type "path\to\Anaconda3" (no quotes – e.g., C:\Users\Derek\Anaconda3)
Go to https://github.com/doylelab/rxnpredict. On right, click "Clone or Download" and then "Download Zip". This will download a local copy of the repository (folder) to your computer.
All of the molecules whose properties will be used for modeling must first be drawn in Spartan:
- Use the Spartan GUI to draw the molecules, saving them in the spartan_molecules folder (within the rxnpredict folder).
- Be sure to label any shared atoms within a substrate class (ligand, base, etc.) with a "*". You can do so by right clicking an atom, then click "Properties". Change the label text at the bottom of the dialog box (e.g., *C1).
- Save the molecules in both .spardir format (for future editing) and .spinput format (this is what the program uses).
Modify the python scripts:
- In setup.py, change the value of spartan_path (line 16 only) to the path of the Spartan14v114.exe file. Be sure to use \\ between folder names.
- In main.py, describe the 2D layout of the plates you have run (line 10 onwards). Helpful syntax:
  - plate_name = Plate(x,y) where x is the number of rows and y is the number of columns.
  - plate_name.fillRow([list], 'substrate_class', 'molecule_name') where 'molecule_name' corresponds to the name of that molecule's .spinput file. Replace fillRow with fillColumn to populate columns instead. Note: If your plate design does not conform to one molecule per row/column, you can modify the Plate's dimensions accordingly. If necessary, an Nx1 plate can specify the components of each reaction individually.
  - After the plates are filled (all "cells" of the plates must be populated with the same kinds of substrate_class), insert the following lines (where the plate names match the ones you have created):
```
 setup.export_reactions([plate1,plate2,plate3])
 setup.export_for_pca([plate1,plate2,plate3])
```
- Run main.py (ctrl + B in Sublime Text) to create R\output_table.csv. This table consists of one row per reaction and one column per descriptor per reaction component. For example, a 2000-reaction screen with 20 base descriptors, 50 ligand descriptors, and 30 additive descriptors would generate a table with 2000 rows and 100 columns in R\output_table.csv.
Run the analysis_template.R file:
- First, save yield data in Excel in a single column in a file named yields.csv in the rxnpredict folder (note that the file will need to be saved in .csv format). Reactions without yield data due to analytical or other issues should be coded as NA.
- Open analysis_template.R in R Studio and modify the working directory to the location of the rxnpredict folder (line 9). [Note: Before running the R code on a Mac, change the file locations such that folders are separated by “/” instead of “\\”.] Run the code by pressing ctrl + A and then ctrl + enter, which will perform the following steps:
  - Loading and scaling the output data generated by the python script.
  - Merging the yield data from yields.csv onto the dataset.
  - Splitting the data into training (70%) and test (30%) sets.
  - Training a random forest model using the training set.
  - Plotting a calibration plot of the model using the test set.
  - Calculating R^2 and RMSE values for the model using the test set.
  - Generating a variable importance plot for the model.
- In the R Studio console, the test set R^2 and RMSE values should be printed in black text. A calibration plot and variable importance plot should be located in the R\plots folder.

rxnpredict's People

Contributors

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.