waldronlab / curatedmetagenomicdataanalyses Goto Github PK

View Code? Open in Web Editor NEW

21.0 10.0 8.0 244.73 MB

Analyses in R and Python Using curatedMetagenomicData

Home Page: https://waldronlab.io/curatedMetagenomicDataAnalyses/

License: Creative Commons Attribution 4.0 International

Python 91.87% R 2.55% Dockerfile 0.17% Shell 5.41%

r microbiome-analysis microbiome-data bioconductor

curatedmetagenomicdataanalyses's Introduction

curatedMetagenomicDataAnalyses

This repository provides biologically relevant analyses using the curatedMetagenomicData package, both using R/Bioconductor and using Python. You can run both R and Python analyses locally in the provided Docker container, or on the Cloud for free.

Running in the Cloud (free)

A machine with all dependencies, code from this repository, and Jupyterlab (with R and Python3) and RStudio running is available at http://app.orchestra.cancerdatasci.org/ (search for the Curated Metagenomic Analyses workshop). You can use these machines for up to 8 hours at a time.

Running locally using Docker

Requirements

You need Docker.

Getting Started

First build the image:

docker build -t "waldronlab/curatedmetagenomicanalyses" .

Then run a container based on the image with your password:

docker run -d -p 80:8888 --name cma \
  waldronlab/curatedmetagenomicanalyses

Visit localhost in your browser.

Running locally without Docker

Start with an installation of the current version of Bioconductor (see https://bioconductor.org/install/). Older versions probably will not work. Installation directly from GitHub requires first installing the remotes package, then:

BiocManager::install("waldronlab/curatedMetagenomicDataAnalyses", dependencies = TRUE)

Analyses

R Vignettes

Python Notebooks

Sex-related differences in the human microbiome using cMD3 and Python3

Supplementary Materials

Installing Python dependencies in Linux (Python notebook)

curatedmetagenomicdataanalyses's People

Contributors

Stargazers

Watchers

Forkers

kosticlab silask jwokaty bfebles ysc12451 xueyao0830 samiul1356 delicial0206

curatedmetagenomicdataanalyses's Issues

conda install available ?

Hi, I was wondering if there is a conda install option available for your tool or if ever you're planning? I saw that there was for curatedMetagenomicData but not for *Analyses. Thank you!

get GHA building

README in docker

@lwaldron I want to clarify the content for the README file that will be in the file explorer on the left side of Jupyter Lab. Are these just the instructions to run the analyses?

We could just add that to the repository's README and then I could just copy that file up one level from the curatedMetagenomicAnalyses repository in the docker. If that's confusing, I can make a different README for the docker.

BiocManager::install("waldronlab/curatedMetagenomicAnalyses") failed

> BiocManager::install("waldronlab/curatedMetagenomicAnalyses")
Bioconductor version 3.12 (BiocManager 1.30.16), R 4.0.5 (2021-03-31)
Installing github package(s) 'waldronlab/curatedMetagenomicAnalyses'
Downloading GitHub repo waldronlab/curatedMetagenomicAnalyses@HEAD
Running `R CMD build`...
* checking for file ‘/tmp/Rtmpifo5W6/remotes2b43787ca6aa/waldronlab-curatedMetagenomicAnalyses-677f1be/DESCRIPTION’ ... OK
* preparing ‘curatedMetagenomicAnalyses’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘curatedMetagenomicAnalyses_0.4.0.tar.gz’
* installing *source* package ‘curatedMetagenomicAnalyses’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
Error: object ‘returnSamples’ is not exported by 'namespace:curatedMetagenomicData'
Execution halted
ERROR: lazy loading failed for package ‘curatedMetagenomicAnalyses’
* removing ‘/public/home/sample_lib/ckzhu/miniconda3/envs/R_4.0.0/lib/R/library/curatedMetagenomicAnalyses’
Warning message:
In i.p(...) :
  installation of package ‘/tmp/Rtmpifo5W6/file2b4327817dad/curatedMetagenomicAnalyses_0.4.0.tar.gz’ had non-zero exit status

Duplicate row names in dataDump

When I run the example in dataDump.Rd, I get the following error:

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘MH0001’, ‘MH0002’, ‘MH0003’, ‘MH0004’, ‘MH0005’, ‘MH0006’, ‘MH0007’, ‘MH0008’, ‘MH0009’, ‘MH0010’, ‘MH0011’, ‘MH0012’, ‘MH0013’, ‘MH0014’, ‘MH0015’, ‘MH0016’, ‘MH0017’, ‘MH0018’, ‘MH0019’, ‘MH0020’, ‘MH0021’, ‘MH0022’, ‘MH0023’, ‘MH0024’, ‘MH0025’, ‘MH0026’, ‘MH0027’, ‘MH0028’, ‘MH0030’, ‘MH0031’, ‘MH0032’, ‘MH0033’, ‘MH0034’, ‘MH0035’, ‘MH0036’, ‘MH0037’, ‘MH0038’, ‘MH0039’, ‘MH0040’, ‘MH0041’, ‘MH0042’, ‘MH0043’, ‘MH0044’, ‘MH0045’, ‘MH0046’, ‘MH0047’, ‘MH0048’, ‘MH0049’, ‘MH0050’, ‘MH0051’, ‘MH0052’, ‘MH0053’, ‘MH0054’, ‘MH0055’, ‘MH0056’, ‘MH0057’, ‘MH0058’, ‘MH0059’, ‘MH0060’, ‘MH0061’, ‘MH0062’, ‘MH0063’, ‘MH0064’, ‘MH0065’, ‘MH0066’, ‘MH0067’, ‘MH0068’, ‘MH0069’, ‘M [... truncated]

Error in access data

Hello to everyone,

I am currently trying to get access to the data contained at the curatedMetagenomicData repository but I get the following error message:

britol = BritoIL_2016.metaphlan_bugs_list.stool()
Error in UseMethod("filter_") : 
  no applicable method for 'filter_' applied to an object of class "c('tbl_SQLiteConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"

How can me addressed this issue?
thank you a lot for you support

retrieve raw data

Hi,
Is there a way to get the raw metagenomics fastq files from the listed studies?

login screen does not display properly

When I open http://app.orchestra.cancerdatasci.org/login in Chrome, I simply get a page that says "Welcome to the Orchestra platform" and nothing more. I had hoped to login and try the server; can this be fixed?

Add link to paper and fix figure

I think we're missing a reference to a paper @paolinomanghi's notebook:

The flag "-m" will attach the per-sample metadata available in curatedMetagenomicData 3 to their taxonomic 
profiles. We now switch to a python 3 set of instructions that can be used to perform the main analysis of 
Figure 2, panel a, of the paper "***".

And I think there's still some issue with the figure:

Changes to Python analysis notebook

@paolinomanghi I wanted to make the following recommendations to help the notebook run and also to separate it into an analysis notebook and an installation notebook. I could probably write the installation notebook since I needed to install everything for the docker--should I do that?

Remove the installation instructions and the git clone line so that you have

This notebook contains the instructions to run a meta-analysis of sex-related contrasts in the human gut microbiome, using curatedMetagenomicDataCLI and a set of freely-available python programs. See `installation.ipynb` for installation instructions.

As described here, we are now going to: 
1) create a folder called **species_abundances_from_cMD3CLI**
2) go in that directory
3) download all the taxonomic profiles from the **curatedMetagenomicDataCLI** workflow

I thought it might be helpful to add the comment after 3 that "This step will take some time." or anywhere that it may take time to process.

When making species_abundances_from_cMD3CLI, I thought we could put the code outside of the repository. For example

%%bash
mkdir /home/waldronlab/species_abundances_from_cMD3CLI
cd /home/waldronlab/species_abundances_from_cMD3CLI
curatedMetagenomicData -m "*relative_abundance"

Later, when we import your python modules and tools, we append as follows:

sys.path.append("../python_modules/")
sys.path.append("../python_tools/")

I think for all the "help" sections I suggested that we make it runnable code to keep the notebook small and make it interactive. For meta_analysis_data you can do like the following:

%%bash
python ../python_tools/meta_analysis_data.py -h

We should also change the path in the params:

params = {
    'input_folder': "/home/waldronlab/species_abundances_from_cMD3CLI/",
    "output_dataset": "a_dataset_for_the_sex_contrast_in_gut_species.tsv",
    "min": ["age:16"],
    "max": [],
    "cat": ["study_condition:control", "body_site:stool"], 
    "multiple": -1,
    "min_perc": ["gender:25"],
    "cfd":["BMI"], 
    "iqr": [],
    "minmin": "gender:40",
    "study_identifier": "study_name", 
    "verbose": False, 
    "debug": False,
    "binary": [],
    "search": [],
    "exclude": []
}

Then I believe everything runs. Here is the notebook with some of those edits for reference:
curatedMetagenomicData 3 CLI interface, sex-contrast microbiome meta-analysis.zip

Make a pip requirements file to version Python packages

Make pip requirements file to version Python packages and better control behavior.

Retrieve biom table and metadata tsv for entire dataset

Is there a simple script available in python or R to get all of the data as a biom table and tsv with matching sample indices?

Fix docker issues

Docker needs update

Age and Sex metaanalysis vignettes broken in cMD 3

See the following after upgrading GHA to Bioconductor 3.14 https://github.com/waldronlab/curatedMetagenomicAnalyses/runs/5558310146?check_suite_focus=true#step:18:258 :

Quitting from lines 120-122 (Age_metaanalysis_vignette.Rmd) 
Error: processing vignette 'Age_metaanalysis_vignette.Rmd' failed with diagnostics:
'ranks' must contain values from 'taxonomyRanks()'
--- failed re-building ‘Age_metaanalysis_vignette.Rmd’

Create a pkgdown site for the analyses

Use the customization feature of the pkgdown site to link directly to the notebooks, which display directly in github.

Create a table of contents for the analyses in the README

README.md links don't work from pkgdown

In the pkgdown site (https://waldronlab.io/curatedMetagenomicAnalyses/index.html) The following links from the README.md file lead to waldronlab.io/ etc when they should lead to github.com/waldronlab/curatedMetagenomicAnalyses, and give "no content found" messages as a result:

Analyses

R Vignettes

Python Notebooks

Sex-related differences in the human microbiome using cMD3 and Python3

Supplementary Materials

Installing Python dependencies in Linux (Python notebook)

possible to change repo name

Hi @jwokaty, would it be ok to change the repo name to curatedMetagenomicDataAnalyses? I've been trying to move everything towards consistent naming and it would help. Hope it's not too much to ask!

Make a Dockerhub image

Following the example of https://github.com/seandavi/buildabiocworkshop, but probably using a different base Docker image that:

doesn't need to have all the dependencies of bioconductor_docker (although it could)
has jupyterhub (if practical)
has this package and its Depends/Imports/Suggests installed

whether the study factor needs to be considered?

Thank you and your team for developing the curatedMetagenomicData package. I have encountered some confusion when using this package. Firstly, whether the relative abundance table has been standardized and does not need to consider the batch of the study？ Is this the final relative abundance table?
Secondly, I downloaded a cancer data from different studies using curatedMetagenomicData, and whether the study factor needs to be considered when finding the significantly different microbiome using massLin. Or, what other analysis methods for finding the significantly different microbiome are recommended when using curatedMetagenomicData? Or, do I need to follow this tutorial (vignettes/Sex_metaanalysis_vignette.Rmd). Looking forward to your reply.

gh-pages missing vignette

The vignette I recently added is missing from the gh-pages pkgdown site. Can you figure out what is going on @jwokaty?

skbio not installing

Fix install for skbio in the python notebook and install script.

Error Assay.type

Hello everyone,
when I try to use the function to convert to phyloseq I get an error massage that I do not know how to menage:

 makePhyloseqFromTreeSummarizedExperiment(alcoholStudy, abund_values = "relative_abundance")
Error: 'assay.type' must be a valid name of assays(x)

How can be it solved?
thank you

translate cMD1 paper analyses to cMD3

This will replace issue waldronlab/curatedMetagenomicData#70

See https://github.com/waldronlab/curatedMetagenomicData/tree/legacy/vignettes/extras for code to move over.

This is intended as a learning exercise and will be interesting to see how these analyses have changed since the original publication, but there's no need to reproduce something that is too difficult, or to maintain old code if there's an easier way to do it.

waldronlab / curatedmetagenomicdataanalyses Goto Github PK

curatedmetagenomicdataanalyses's Introduction

curatedMetagenomicDataAnalyses

Running in the Cloud (free)

Running locally using Docker

Requirements

Getting Started

Running locally without Docker

Analyses

R Vignettes

Python Notebooks

Supplementary Materials

curatedmetagenomicdataanalyses's People

Contributors

Stargazers

Watchers

Forkers

curatedmetagenomicdataanalyses's Issues

Analyses

R Vignettes

Python Notebooks

Supplementary Materials

Recommend Projects

Recommend Topics

Recommend Org