global-biofoundries-alliance / dna-scanner Goto Github PK

Online tool for comparing prices and feasibility of DNA synthesis

License: MIT License

Python 79.01% JavaScript 0.92% HTML 0.23% Vue 19.41% Shell 0.40% Sass 0.04%

dna-scanner's Introduction

DNA-scanner

Web-application for rapid checking of multiple-DNA sequences for feasibility and time of DNA synthesis with multiple vendors.

If you find this work useful, please consider citing the corresponding publication:

DNA Scanner: a web application for comparing DNA synthesis feasibility, price and turnaround time across vendors

Gledon Doçi, Lukas Fuchs, Yash Kharbanda, Paul Schickling, Valentin Zulkower, Nathan Hillson, Ernst Oberortner, Neil Swainston, Johannes Kabisch

Synthetic Biology, Volume 5, Issue 1, 2020, ysaa011

https://doi.org/10.1093/synbio/ysaa011

Installation

First steps

Check out the project from the git repository.

First the configuration file config.yml in the directory Backend must be customized.

The application runs in a docker container, therefore docker and docker-compose must be installed.

Development installation

In the development environment, a database is made available within a container. This is not the case in the production environment, because it is not recommended for production use.

Use the shell script deploy.sh to start the environment. The script will run docker-compose with the docker-compose.yml and it will be extended by the docker-compose.override.yml.

    chmod 775 deploy.sh
    sudo ./deploy.sh

By default the ./Backend/config.yml has the database credentials as configured in the docker-compose.override.yml.

By default the volumes of the database are bound to /srv/dnascanner/db/ to make the data persistent. You can make the saved information temporary by removing the volume shown below from the docker-compose.override.yml.

- /srv/dnascanner/db:/var/lib/mysql

Force Rebuild

You can force a rebuild without caching using the script rebuild.sh.

    chmod 775 rebuild.sh
    sudo ./rebuild.sh

Production installation

The production environment requires a separate database. You can configure it in ./Backend/config.yml. In the production environment the database must be separate because it is not recommend to run a database inside of a container for production use.

Certificates for https are required for the production environment. The certificates must be placed in /srv/dnascanner/cert/. In the following code we create the directory and generate self-signed certificates. Alternatively you can put your own certificates there. The certificates are configured in the file nginx-secure.conf.

    mkdir /srv/dnascanner/cert -p
    sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /srv/dnascanner/cert/cert.key -out /srv/dnascanner/cert/cert.crt

The nginx-secure.conf will be used as default. You can also place an specific nginx configuration file in /srv/dnascanner/nginx/.

Use the shell script deploy-prod.sh to start the environment. The script will run docker-compose with the docker-compose.yml and it will be extended by the docker-compose.prod.yml.

    chmod 775 deploy-prod.sh
    sudo ./deploy-prod.sh

Configuration

Name	Description
Port	The default port is 80. You can change it in the docker-compose.prod.yml. At services > frontend > ports. Just write your port before the ":443".
Mapped folders	The default the folders are mapped into /srv/dnascanner/. You can change it in the docker-compose.prod.yml. At services > frontend > volumes. Just write your path before the ":".

Common Errors

Failure	Description	Solution
Failure running the Deploy-Script: "Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock"	The user on the system does not have the rights to start the containers	Run the Script with sudo

dna-scanner's People

Contributors

Stargazers

Watchers

Forkers

edinburgh-genome-foundry pschickling zulko dnas-biodatasci j1souza asimovbio

dna-scanner's Issues

Minor design improvements

@yashk4

please consistently use viridis: also for top-navigation bar (with DNA scanner, currently light blue), buttons, etc...
move boost banner below processing settings
at the price filter I can only go to a max. of 1000, is that for total order or per sequence? Might be to low.
add respective currency symbols (€/$)

Containerize / dockerize software for easy deployment

Would be nice to make it easy for people to quickly spin up (for development or production) their own instances of the application.

domain to rent

@neilswainston @njhillson @Zulko @eoberortner
should I rent dna-scanner.org or .com or will we rename?

Authentication / authorization and granular permissions and user roles

It will be important to set up authentication and authorization with granular permissions controls and user roles.

We will likely want to keep subsets of the data private, private to specific groups, with some people only able to read, others able to read and write, etc.

Will need to have different types of user roles, like admin, or curator, or viewer, etc.

These permissions should be enforced for all channels of accessing and modifying data, e.g. API end points.

Support scanning/ordering from Twist

Implement web user interface

In addition to API end points, it would eventually also be nice to make functionality and data accessible through a javascript (for example) UI.

Demonstrate python/jupyter notebook AI processing of data via API

It would be cool as a demonstration of the application to show a workflow where someone via a jupyter notebook pulls data from the DNA scanner database via its API, and processes this data somehow. A nice stepping stone to doing useful things (potentially within the app itself rather than having to go externally to a 3rd party python/jupyter notebook) with the data.

Progress indicator improvements

@yashk4 Please improve upon the progress indicator

The spinning circle is off center in relation to the text
The text could be more informative (e.g. optimizing with boost, waiting for GeneArt,...)

Account for Twist

In order to buy something on twist you need an account, wherefore it would be very helpful if we would get an account with which we can do api calls on their production server. I have already created an account but the problem is that I am not even able to log in through their api, where I always get the error message "Forbidden". I also wrote an email to the twist support, but unfortunately I didn't get a response. @njhillson

Develop common data model

Covering basic concepts like Design, Order, etc.

Use boost tutorial

small tutorial for the CS students how to use boost api

Ideal outcome of DNA-scanner project

It would be great to have a python library that enables accessing the vendor APIs in a uniform way.

I can see the following required steps for an end-user of the DNA-scanner python library.

A dependency on the DNA-scanner must be specified in the "requirements.txt" of a Python project
Import the DNA-scanner in the Python code, e.g.,
from dna_scanner_lib import DNAScanner
Use the DNAScanner, e.g.,
DNAScanner.screenComplexity(array_of_sequences, array_of_vendors)

Does this make sense?

consistent writing

the writing is inconsistent with mixing of capital and non-capital letters

e.g. Does your file consists of only Amino Acid Sequences?

should be amino acid sequences
same for Project Name

Support scanning/ordering from GeneArt

Provide IP-whitelist

@PSchickling @yashk4 @gled0n @Lukas-Fuchs Dear CS-Students please send me a whitelist of IP addresses.

Data model / schema for historial dna synthesis order metrics data

One possibility for the DNA scanner software would be to source historical dna synthesis order metrics data from the community (e.g. here are the N sequences I ordered on date D from company X, here are the M sequences they attempted to make and the N-M sequences they refused to make (and why), here are the M-F sequences they succeed in making and the F sequences they failed to make (and why), here is how long it took to get each of the M-F successful sequences delivered and here is how much each of them cost. Etc.

It would be nice given that general idea to develop a data model / database schema to store this kind of information.

Eventually this kind of data could be used to infer which types of sequences are easy / hard as a function of company to make, here is how long it takes to receive them, here is how much it might cost, etc. This would be a complement to what the company actually says / estimates.

Test run

@yashk4 @gled0n @PSchickling @Lukas-Fuchs
please inform us when we can do user test runs

Is this running somewhere publicly accessible?

If so, what's the URL?

Test

Develop a web-of-DNA-scanners framework

It would be powerful to implement a network of DNA-scanner instances that could query each other. Each institution, for example, could optionally have its own instance that is under its control, with some data private and others shared completely publicly with the whole world, or even just with specific other instances or even specific users on specific other instances. This can be an effective way of getting institutions and people to use the application that might be otherwise concerned about others having control over or access to their data.

Generate request letter and contact DNA synthesis companies

We should compile a list of synthesis companys and contact them with a pre-composed letter requesting their support/api/input.

CS students cannot assign tickets

@njhillson the students cannot assign issues.

Deploy the project behind a recognizable name

The project could be deployed under e.g. dnascanner.biofoundries.org.

Prepare and publish user documentation

It will be important eventually to have publicly accessible documentation for the APIs as well as the web user interface.

Implement interfaces to dna synthesis vendor APIs

Many DNA synthesis companies now have public APIs. Some require accounts/tokens. As part of the DNA scanner, it would be nice as companies permit, to interface with their APIs to query them for cost/time/feasibility estimates.

The first stage of this would be to survey what is available, get documentation, get permission/inform the companies, and then write the interfaces.

RESTful API endpoints for historical dna synthesis metrics data read/write

Related to ticket #2

It would be nice to have documented and implemented API end points that enable 3rd party / tool data read / write

Parse files in SBOL1-format.

I am currently writing the parser for the three different file formats using the tool Biopython (https://biopython.org/) for Fasta and GenBank and the tool PySbol for SBOL files. My idea of parsing at this moment is the following: parse for every sequence in the input file, its name and the bases itself (ATCG).

PySbol seems to work great for the SBOL2 format but not for the SBOL1. The function Document.read() seems to not be able to read the data of the input file. I have attached a Jupyter-Notebook and the two .xml files (one in SBOL1 and the other in SBOL2) so you can see where I seem to be stuck.
Maybe I am doing something wrong? Or maybe you have a better way to do this? Insight would be appreciated.
sbol-parsing.zip

Feature Requests for DNA scanner

Dear GBA software group,

please start suggesting features!

Consider deploying accessible web server for app evaluation

Would it be possible to deploy a web server (possibly behind a IP filter or authentication/authorization mechanism) for folks to try out the web app?

Permissions to Push Code

Hello,
can somebody please give the CS-Students the permission to push their code.

CS-Students: @PSchickling @gled0n @Lukas-Fuchs @yashk4

Thank you

Back button to start new query

@yashk4 currently there is no intuitive method to go back to the start page to start a new query

Generation of order sheets

As discussed in our conf call on Fr, Jan 31st, a cool feature of the DNAScanner web app would be to automatically generate the order sheets for one (or more) vendors. We (JGI) have some templates, but I'd suggest to also reach out to the vendors directly, asking if they could provide templates to us (DNAScanner devs).