Coder Social home page Coder Social logo

curioustauseef / pywren-annotation-pipeline Goto Github PK

View Code? Open in Web Editor NEW

This project forked from metaspace2020/lithops-metaspace

0.0 0.0 0.0 9.63 MB

Serverless implementation of the METASPACE pipeline using PyWren IBM Cloud

Jupyter Notebook 25.83% Python 72.42% Dockerfile 1.75%

pywren-annotation-pipeline's Introduction

METASPACE annotation pipeline on IBM Cloud

Experimental code to integrate METASPACE engine with PyWren for IBM Cloud.

Instructions for use

Prerequisites:

  • Python 3.6.x

    Python must be one of the 3.6 versions (i.e. not 3.7 or above, not 3.5 or below) to work with the pre-built runtime.

  • IBM Cloud account

    1. Sign up here: https://cloud.ibm.com/
    2. Create a Cloud Object Storage bucket
    3. Create a IBM Cloud Functions namespace and CloudFoundry organization, ideally in the same region as the Cloud Object Storage bucket.
  • Jupyter Notebook or Jupyter Lab

Setup

  1. Clone and install this repository with the following commands:

    git clone https://github.com/metaspace2020/pywren-annotation-pipeline.git
    cd pywren-annotation-pipeline
    pip install -e .
    
  2. Copy config.json.template to config.json and edit it, filling in your IBM Cloud details. It is fine to use the same bucket in all places.

  3. Run one of the below notebooks.

Example notebooks

The main notebook is pywren-annotation-pipeline-demo.ipynb, which allows you to run through the whole pipeline, and see the results at each step.

There are also 3 notebooks prepared for benchmarking that can be run with Jupyter Notebook:

  1. experiment-1-typical.ipynb - Demonstrates running through the whole Serverless metabolite annotation pipeline with a typical dataset,
    downloading the results and comparing them against the Serverful implementation of METASPACE.
  2. experiment-2-interactive.ipynb - An example of running the pipeline against a smaller set of molecules, to demonstrate the potential of Serverless to provide low-latency access to computating resources.
  3. experiment-3-large.ipynb - A stress test that runs the Serverless metabolite annotation pipeline with a large dataset and many molecular databases.

Example datasets

Dataset Author Config file
Brain02_Bregma1-42_02 Régis Lavigne,
University of Rennes 1
ds_config1.json
AZ_Rat_Brains Nicole Strittmatter,
AstraZeneca
ds_config2.json
CT26_xenograft Nicole Strittmatter,
AstraZeneca
ds_config3.json
Mouse brain test434x902
Captured with AP-SMALDI5
and Q Exactive HF Orbitrap
Dhaka Bhandari,
Justus-Liebig-University Giessen
ds_config4.json
X089-Mousebrain_842x603
Captured with AP-SMALDI5
and Q Exactive HF Orbitrap
Dhaka Bhandari,
Justus-Liebig-University Giessen
ds_config5.json
Microbial interaction slide Don Nguyen,
European Molecular Biology Laboratory
ds_config6.json

Example databases

These molecular databases can be selected in the ds_config.json files. They are automatically converted to pickle format and uploaded to IBM cloud in the notebooks.

Database Filename Description
HMDB mol_db1.pickle Human Metabolome Database
ChEBI mol_db2.pickle Chemical Entities of Biological Interest
LIPID MAPS mol_db3.pickle
SwissLipids mol_db4.pickle
Small database mol_db5.pickle This database is used in Experiment 2 as an example of a small set of user-supplied molecules for running small, interactive annotation jobs.
Peptide databases mol_db7.pickle
...
mol_db12.pickle
A collection of databases of predicted peptides. These databases were contributed by Benjamin Baluff (M4I, Maastricht University) exclusively for use with METASPACE.

Acknowledgements

image

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 825184.

pywren-annotation-pipeline's People

Contributors

gilv avatar intsco avatar josepsampe avatar kpavel avatar lachlanstuart avatar omerb01 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.