Coder Social home page Coder Social logo

dogs's Introduction

Analysis Notebooks

This repository contains the notebooks used to create the analyses in Vazquez-Baeza et al. 2016. Each notebook tries to summarize a particular step and be as standalone as possible. In this document we briefly describe each of the analyses needed for the paper and when possible we group them together, so as to provide more cohesion.

Important notes for the reader

  • This repository does not provide all the data files used, as the size would exceed the limits allowed in GitHub. However, we provide the main data files (metadata and OTU tables), from which the rest of the data can be generated using these commands.

  • In several locations in the notebooks, there are cells that reference a remote address (i.e. in ssh or scp commands), you don't need to execute these commands, the files being fetched should already be provided in this repository.


Metadata

The metadata processing required two steps, one for cleanup of the data and another one to prepare the data for Qiita.

1-metadata: the metadata is cleaned up and filtered to remove samples that we didn't use in the rest of the analyses. We also calculate the dysbiosis index as defined in Gevers et. al. 2014.

1.1-metadata-for-qiita: adds the needed fields and columns to the mapping file, and creates a sample and prep template that were used to upload the data into the Qiita study.

Alpha diversity

2-alpha-diversity: this notebook includes the following alpha diversity comparisons: fat, protein, age, weight and disease state. As well as a comparison of the human-trained dysbiosis index and alpha diversity. Of note, we did this comparisons for several metrics, but only used Faith's phylogenetic diversity in the manuscript.

Beta diversity

3-beta-diversity: this notebook includes the creation of the beta diversity plots for the dog dataset only. Biplots and statistics to assess clustering significance are also performed as part of this notebook. 3.1-beta-diversity-antibiotics.ipynb: compares the differences between samples according to their history fo antibiotic usage.

Group significane

4-group-significance: this notebook tests statistical significance between the disease affected and unaffected dogs, and plots their relative abundance as a heatmap. While none of the plots shown in this notebook were used in this paper, it helped guide our analysis for the next few notebooks.

Feature exploration

5-feature-exploration: this notebook looks at a few different ways to filter out the data so as to avoid OTUs that are not well represented througout the samples.

New dysbiosis index

After realizing that the human-trained dysbiosis index didn't perform as well in dogs, we decided to use CCREPE to train a new dysbiosis index using the dog data alone. In 6-md-index-ccrepe we calculate the checkerboard scores and asscociated significance tables. These results are used in 6.1-md-index-ccrepe-visualizations, where we visualize them in a variety of ways, ultimately resolving that we should use Cytoscape to do that. The final section of this notebook shows the plots relating alpha diversity and the index.

Classification accuracy

The ROC curves and feature importance scores are created in 7-classifier and 7.1-classifier-feature-importance (respectively). Here we use R and hack_ml to create the plots and tables.

Human vs Dog comparison

In 8-comparison we explore the combined data and perform a few comparisons that were ultimately not used in the paper. 8.1-comparison is concerned with making the data between humans and dogs as comparable as possible.

PICRUSt

PICRUSt predictions were generated at the galaxy server. In 9-picrust we compare the combined human and dog samples, and in 9.1-picrust-nsti we use the NSTI (nearest sequenced taxon index) to assess the quality of our predictions.

Comparison with Minamoto et al 2015

In 10-minamoto-md-index notebook we use the dog-trained dysbiosis index in a different dataset, that was processed mainly in a separate supercomputer.

Read counts

In 11-sequence-counts, we explore the number of sequences that were assigned to an OTU per sample. Specifically we compare the differences betwee closed and open reference protocols.


List of Python dependencies that were not explicitly noted in the notebooks are listed in requirements.txt.

dogs's People

Contributors

eldeveloper avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.