METASPACE annotation pipeline on IBM Cloud

Experimental code to integrate METASPACE engine with PyWren for IBM Cloud.

Instructions for use

Prerequisites:

Python 3.6.x

Python must be one of the 3.6 versions (i.e. not 3.7 or above, not 3.5 or below) to work with the pre-built runtime.
IBM Cloud account
1. Sign up here: https://cloud.ibm.com/
2. Create a Cloud Object Storage bucket
3. Create a IBM Cloud Functions namespace and CloudFoundry organization, ideally in the same region as the Cloud Object Storage bucket.
Jupyter Notebook or Jupyter Lab

Setup

Clone and install this repository with the following commands:

git clone https://github.com/metaspace2020/pywren-annotation-pipeline.git
cd pywren-annotation-pipeline
pip install -e .

Copy config.json.template to config.json and edit it, filling in your IBM Cloud details. It is fine to use the same bucket in all places.
Run one of the below notebooks.

Example notebooks

The main notebook is pywren-annotation-pipeline-demo.ipynb, which allows you to run through the whole pipeline, and see the results at each step.

There are also 3 notebooks prepared for benchmarking that can be run with Jupyter Notebook:

experiment-1-typical.ipynb - Demonstrates running through the whole Serverless metabolite annotation pipeline with a typical dataset,
downloading the results and comparing them against the Serverful implementation of METASPACE.
experiment-2-interactive.ipynb - An example of running the pipeline against a smaller set of molecules, to demonstrate the potential of Serverless to provide low-latency access to computating resources.
experiment-3-large.ipynb - A stress test that runs the Serverless metabolite annotation pipeline with a large dataset and many molecular databases.

Example datasets

Dataset	Author	Config file
Brain02_Bregma1-42_02	Régis Lavigne, University of Rennes 1	`ds_config1.json`
AZ_Rat_Brains	Nicole Strittmatter, AstraZeneca	`ds_config2.json`
CT26_xenograft	Nicole Strittmatter, AstraZeneca	`ds_config3.json`
Mouse brain test434x902 Captured with AP-SMALDI5 and Q Exactive HF Orbitrap	Dhaka Bhandari, Justus-Liebig-University Giessen	`ds_config4.json`
X089-Mousebrain_842x603 Captured with AP-SMALDI5 and Q Exactive HF Orbitrap	Dhaka Bhandari, Justus-Liebig-University Giessen	`ds_config5.json`
Microbial interaction slide	Don Nguyen, European Molecular Biology Laboratory	`ds_config6.json`

Example databases

These molecular databases can be selected in the ds_config.json files. They are automatically converted to pickle format and uploaded to IBM cloud in the notebooks.

Database	Filename	Description
HMDB	`mol_db1.pickle`	Human Metabolome Database
ChEBI	`mol_db2.pickle`	Chemical Entities of Biological Interest
LIPID MAPS	`mol_db3.pickle`
SwissLipids	`mol_db4.pickle`
Small database	`mol_db5.pickle`	This database is used in Experiment 2 as an example of a small set of user-supplied molecules for running small, interactive annotation jobs.
Peptide databases	`mol_db7.pickle` ... `mol_db12.pickle`	A collection of databases of predicted peptides. These databases were contributed by Benjamin Baluff (M4I, Maastricht University) exclusively for use with METASPACE.

Acknowledgements

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 825184.

curioustauseef / pywren-annotation-pipeline Goto Github PK

pywren-annotation-pipeline's Introduction

METASPACE annotation pipeline on IBM Cloud

Instructions for use

Prerequisites:

Setup

Example notebooks

Example datasets

Example databases

Acknowledgements

pywren-annotation-pipeline's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent