Coder Social home page Coder Social logo

lizzyparkerpannell / untargeted_metabolomics_workflow Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 3.0 69.53 MB

Collaborative workflow for untargeted metabolomics data processing and analysis using open-source tools. https://doi.org/10.3390/metabo13040463

Home Page: https://untargeted-metabolomics-workflow.netlify.app/

License: GNU Affero General Public License v3.0

R 81.93% CSS 10.19% HTML 7.31% JavaScript 0.57%
lc-ms maldi-tof-ms mass-spectrometry metaboanalyst untargeted-metabolomics xcms maldiquant proteowizard

untargeted_metabolomics_workflow's Introduction

Untargeted Metabolomics Workflow

Code in this repository can help with processing untargeted metabolomics of e.g. plant secondary metabolites or E.coli metabolites. The workflow relies on existing open source tools including XCMSonline (for LCMS data), an in-house macro from Overy et al. 2005 (for MALDI or DI-MS data), MassUp or MALDIquant (for MALDI data).

The code can prepare a peak intensity table which is suitable for undirected (PCA) and directed (OPLS-DA) analysis using Metabolanalyst, R or SIMCA-P+ (proprietary).

Citation and contact

For now please cite this github repository if you use the code (https://github.com/LizzyParkerPannell/Untargeted_metabolomics_workflow).

A manuscript (currently under review) providing ovierview of the LCMS workflow is available at:

Parker, Ε.J.; Billane, K.C.; Austen, N.; Cotton, A.; George, R.M.; Hopkins, D.; Lake, J.A.; Pitman, J.K.; Prout, J.N.; Walker, H.J.; Williams, A.; Cameron, D.D. Untangling the Complexities of Processing and Analysis for Untargeted LC-MS Data Using Open-source Tools. Preprints 2023, 2023020056 (doi: 10.20944/preprints202302.0056.v1).

Contribution credits

Elizabeth Parker1❋, Kathryn Billane2❋, James Pitman3, James Prout3, David Hopkins5, Alex Williams5, Heather Walker4, Rachel George4, Duncan Cameron6 .

❋ EP and KB are grateful to the University of Sheffield for “Unleash your data and software funding” that facilitated documentation of this workflow

  1. EP planned and coordinated the project, wrote the R codes, contributed to writing, collated the documents
  2. KB developed E. coli protocols, contributed to writing and collating documentation
  3. JP and JP contributed to writing documents, tested workflow, contributed to R codes
  4. HW and RG contributed protocols and guides to in-house tools, tested workflow, contributed to project planning
  5. DH, AW tested workflows and R codes, contributed to project planning and aims
  6. DC contributed to project planning and aims, supported the project with resources

Acknowledgements

With thanks to Harry Wright, Rachel George, Sophia van Mourik, Anne Cotton and Erika Hansson for their feedback.

A note on experimental structure

A lot of the difficulties in analysis and/ or workflows come from the complexities of experimental structure. A lot of terms are used interchangeably in different contexts. Most tools for untargeted metabolomics are set up for 1 factor analysis with two or three levels e.g.

  • case vs control
  • wild-type vs transgenic line
  • Strain 1 vs strain 2 vs strain 3

However, we quite often have more complex experimental designs when coming from other fields e.g.

  • 2 factor with two or more levels in each such as +/- treatment for 2 strains
  • Time course for 1 or 2 factors such as +/- treatment for 2 strains over three time points

Before you start, think about the following questions and make a note of what you’re expecting in terms of which groups of metabolite fingerprints could be similar and which could be different to each other. I don’t mean hypothesise but more, think logically about what you’re asking in your analysis and how your data will be grouped.

  • What are your biological replicates and are they independent of each other (or have you resampled the same organism/ population multiple times)?
  • Do you have technical replicates (was each extract run through the MS multiple times)?
  • Do you want/ need any QC samples, or analytical standard samples?
  • If you have more than one treatment, or lots of meta-data, get organised. What groupings do you need to use to ask the questions you want answers to?

untargeted_metabolomics_workflow's People

Contributors

alchu86 avatar jkellaway avatar lizzyparker avatar lizzyparkerpannell avatar outandaprout avatar rmgeorgesheff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

untargeted_metabolomics_workflow's Issues

index issues

  • Alter all cases of index.en.md to _index.en.md --> this is why menu wasn't working!!!

Add Heathers Extraction protocol to the extractions page

Heather has a paper with the in house prefered plant extraction protocol.

Overy SA, Walker HJ, Malone S, Howard TP, Baxter CJ, Sweetlove LJ, Hill SA, Quick WP. Application of metabolite profiling to the identification of traits in a population of tomato introgression lines. J Exp Bot. 2005 Jan;56(410):287-96. doi: 10.1093/jxb/eri070. Epub 2004 Dec 13. PMID: 15596481.

Either as a paper of as a bullet point extractions doc

Clean up code

  • all functions into 00_functions_whole_workflow or similar
  • clean functions out of script for each stage
  • add source code for 00_functions_whole_workflow at start of each stage script

Update workflow to incorporate MALDIquantForeign and MALDIquant

Now that we've sorted the file conversion parameters to get MALDIquant to work in R, can adapt the workflow with MassUp being optional (still good for QC)

  • update R codes for step 04_data-preprocessing (MALDI and DIMS)
  • update guide/ documentation to explain the options
  • link directly to MALDIquantForeign and MALDIquant vignettes as these are very good

Linking to manuscript

  • When the preprint is published, link to DOI
  • When manuscript is published link to DOI in repository

flavicon

Have added flavicon.png to static/images but isn't working yet.

Search within code to find how to change RSE flavicon to untargeted-metabolomics-workflow flavicon

Demo data

  • find a way to upload .mzML (currently too big for github!) link to external hosting?
  • upload .mzML converted files and treatments.csv and samplelist.csv
  • upload peak table created with MassUp
  • create demo data README.txt including link to paper and thesis and citation info

Content for website

Set up folder structure for a website to document the workflow

  • create content folder
  • create folders for each stage of the workflow within content e.g. 00_overview
  • create folders called faq_glossary_links and setup
  • create files called index.en.md, credits.en.md and snippets.en.md
  • within each subfolder, create index.en.md with title of section

Add glossary

We already have a glossary in the drive - bring this across to website and add references

  • add folder called glossary
  • add index.en.md file to that folder
  • populate it with glossary in md
  • check links and weighting on website

Adapting content from thesis/ drive for website

Need to fill in the content text for each stage of the workflow. This will need some adjustment to make it more appicable to those outwith UoS.

Update <index.en.md> in each of these folders

  • 01_
  • 02_
  • 03_
  • 04_
  • 05_
  • 06_
  • 07_

Image/example of file locations.

Hi Lizzy,
I think there is an issue with people understanding the file locations and where to be working from in steps 4 and 5. I keep on getting questions about why code isn't working for them and it is usually because they've got too many subfolders, or the samplelist.csv or treatments.csv in the wrong place.
It would probably clear things up to have a screen grabbed example of the lay out of the files. Showing the root folder, the mass up output folders and their contents, and where the metadata files should go.
I would do this but have no mass UP processed files, and in all honesty am not 100% sure myself.
Hope that makes sense
James

Improve data acquisition page of website

  • info on different types of MS for untargeted analyses
  • highlight ones we cover in this workflow
  • brief pros/ cons of ones we cover in this workflow
  • useful reviews/ refs for decision making
  • link to workflow diagram and explain that type of (LC)MS affects how you process the data

Formatting 00_Overview content

  • convert stages to custom numbered list (stages of the workflow)
  • remove duplicated title (stages of the workflow)
  • convert list of example file names to secondary indented bullet list (assumptions)
  • remove duplicated title (assumptions)
  • remove duplicated title (experimental structure)
  • remove duplicated title (workflow diagram)

Add archetypes folder for website

  • create an archetypes folder
  • create archetypes/chapter.md file and fill with:
+++
title = "{{ replace .Name "-" " " | title }}"
date = {{ .Date }}
weight = 5
chapter = true
pre = "<b>X. </b>"
+++


# Some Chapter title

Lorem Ipsum.
  • create archetypes/default.md file and fill with:
+++
title = "{{ replace .Name "-" " " | title }}"
date =  {{ .Date }}
weight = 5
+++

Lorem Ipsum.

convert indexes to chapter

Add the following to the index of each chapter, with the title altered

---
chapter: true
date: "2022-11-08T11:19:11+01:00"
pre: <b>00. </b>
title: Overview of workflow
weight: 1
---
  • 01_
  • 02_
  • 03_
  • 04_
  • 05_
  • 06_
  • 07_
  • 08_

Add your issues

  • James: you can add your issues (as separate issues or check boxes in a list within one issue)

Citations code not doing full list

This should list 8, only list one when I run it in R
source("https://raw.githubusercontent.com/LizzyParkerPannell/Untargeted_metabolomics_workflow/main/00_workflow_functions.R")

Recreate workflow diagram

  • Use mermaid to recreate workflow diagram
  • Add links (shortcuts) within workflow diagram
  • embed mermaid diagram on website 00_Overview/05_workflow_diagram.en.md

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.