METASPACE annotation pipeline on IBM Cloud
Experimental code to integrate METASPACE engine with PyWren for IBM Cloud.
Instructions for use
Prerequisites:
-
Python 3.6.x
Python must be one of the 3.6 versions (i.e. not 3.7 or above, not 3.5 or below) to work with the pre-built runtime.
-
IBM Cloud account
- Sign up here: https://cloud.ibm.com/
- Create a Cloud Object Storage bucket
- Create a IBM Cloud Functions namespace and CloudFoundry organization, ideally in the same region as the Cloud Object Storage bucket.
-
Jupyter Notebook or Jupyter Lab
Setup
-
Clone and install this repository with the following commands:
git clone https://github.com/metaspace2020/pywren-annotation-pipeline.git cd pywren-annotation-pipeline pip install -e .
-
Copy
config.json.template
toconfig.json
and edit it, filling in your IBM Cloud details. It is fine to use the same bucket in all places. -
Run one of the below notebooks.
Example notebooks
The main notebook is pywren-annotation-pipeline-demo.ipynb
, which allows you to run
through the whole pipeline, and see the results at each step.
There are also 3 notebooks prepared for benchmarking that can be run with Jupyter Notebook:
experiment-1-typical.ipynb
- Demonstrates running through the whole Serverless metabolite annotation pipeline with a typical dataset,
downloading the results and comparing them against the Serverful implementation of METASPACE.experiment-2-interactive.ipynb
- An example of running the pipeline against a smaller set of molecules, to demonstrate the potential of Serverless to provide low-latency access to computating resources.experiment-3-large.ipynb
- A stress test that runs the Serverless metabolite annotation pipeline with a large dataset and many molecular databases.
Example datasets
Dataset | Author | Config file |
---|---|---|
Brain02_Bregma1-42_02 | Régis Lavigne, University of Rennes 1 |
ds_config1.json |
AZ_Rat_Brains | Nicole Strittmatter, AstraZeneca |
ds_config2.json |
CT26_xenograft | Nicole Strittmatter, AstraZeneca |
ds_config3.json |
Mouse brain test434x902 Captured with AP-SMALDI5 and Q Exactive HF Orbitrap |
Dhaka Bhandari, Justus-Liebig-University Giessen |
ds_config4.json |
X089-Mousebrain_842x603 Captured with AP-SMALDI5 and Q Exactive HF Orbitrap |
Dhaka Bhandari, Justus-Liebig-University Giessen |
ds_config5.json |
Microbial interaction slide | Don Nguyen, European Molecular Biology Laboratory |
ds_config6.json |
Example databases
These molecular databases can be selected in the ds_config.json
files. They are automatically converted to
pickle format and uploaded to IBM cloud in the notebooks.
Database | Filename | Description |
---|---|---|
HMDB | mol_db1.pickle |
Human Metabolome Database |
ChEBI | mol_db2.pickle |
Chemical Entities of Biological Interest |
LIPID MAPS | mol_db3.pickle |
|
SwissLipids | mol_db4.pickle |
|
Small database | mol_db5.pickle |
This database is used in Experiment 2 as an example of a small set of user-supplied molecules for running small, interactive annotation jobs. |
Peptide databases | mol_db7.pickle ... mol_db12.pickle |
A collection of databases of predicted peptides. These databases were contributed by Benjamin Baluff (M4I, Maastricht University) exclusively for use with METASPACE. |
Acknowledgements
This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 825184.