Coder Social home page Coder Social logo

project1's Introduction

Project 1

The project for the first half of the semester is due on October 24th as a pull request to this repository. The emphasis for this projet is on data analysis using tools of reproducible research including version tracking, knitr, and R markdown. You are free to answer any question you want, but you must provide a quantitative report that utilizes the material we have covered durign the first half of the semester.

Ground rules...

  • Your document should layout your question in the introduction, provide a brief synopsis for why it is important, describe how you obtained the data, your general approach to analyzing the data, a summary of your results, a discussion, and mention of what your future directions would be.
  • You should post your project as a pull request to this repository as *.Rmd, *.md, and .html files. Prior to submitting the files, you should set echo=FALSE for all of the code chunks. This will hide the R code in the final document.
  • The final document should be approximately 5 pages when the code chunks are hidden and you use a normal sized font.
  • I don't care what your question is or what subject domain it covers. The key is that it needs to be data driven. Microbiology, video game statisitics, immunology, whatever. Do something where it will be easy for you to come up with an interesting question and fun analysis.

#Replicating Jhansi Leslie's Analysis for her Anaerobe 2014 Poster Entitled "Pre-Colonization By a Less Virulent Strain of Clostridium difficile Protects From Infection by a Lethal Strain," using R ###By Shawn Whitefield

##Introduction

Clostridium difficile is an anaerobic spore forming bacterium that is respobsible for 14,000 deaths and 1 billion dollars in excess healthcare costs annually in the US.(1) C. diff infection (CDI) is primarily healthcare associated, and infected individuals can experience symptoms ranging from asymptomatic colonization, to toxic megacolon requiring colectomy. Protection from CDI is thought to occur through interactions between bacteria in the gut as well as via the adaptive immune response. The Young lab has a mouse model of CDI which Jhansi used to study the imact of adaptive immunity and pre-colonization by a less virulent strain on infection outcomes (Figure 1 poster below). Briefly, in this model, mice are given cefoperazone for ten days to perturb their indigenous microbiota. They are then treated with either water or a non-lethal C. diff strain. Both groups of mice are then given clindamycin. After six weeks mice in each treatment group are challanged with a lethal C diff strain and outcomes are measured.

Jhansi Leslie, a graduate student in the Young lab, created this poster for a conference. There are some changes that could strengthen the presentation of her data. In this project, I am running statistical tests and creating new (and hopefully improved) data visualizations for parts of her mouse colonization data.

###Jhansi's Anaerobe 2014 Poster:

##Methods & Results Jhansi looked at the effect of the adaptive immune system on C. diff infection. She infected mice with C. diff strain 630, and compared the level of colonization of wildtype vs. RAG knockout mice (which have no mature B and T lymphocytes) over time.

The data was stored in a large multi-sheet Excel document, so the first step was to generate usable dataframes with the pertinant information.

###Example 1: Initial Excel Dataset

##        ID             cage          sex     treatment genotype
##  Min.   : 1.00   Min.   :1.00   female:11   630:18    RAG:15  
##  1st Qu.: 8.75   1st Qu.:1.75   male  :21   c  :14    WT :17  
##  Median :16.50   Median :3.00                                 
##  Mean   :16.50   Mean   :3.12                                 
##  3rd Qu.:24.25   3rd Qu.:5.00                                 
##  Max.   :32.00   Max.   :6.00                                 
##                                                               
##  X630_infection_day      Day1           Day2           Day4     
##  Min.   :16.2       Min.   :14.8   Min.   :13.6   Min.   :13.5  
##  1st Qu.:18.4       1st Qu.:18.4   1st Qu.:18.2   1st Qu.:17.9  
##  Median :19.3       Median :19.3   Median :19.6   Median :19.2  
##  Mean   :19.8       Mean   :19.8   Mean   :19.8   Mean   :19.6  
##  3rd Qu.:21.4       3rd Qu.:21.4   3rd Qu.:21.2   3rd Qu.:21.8  
##  Max.   :24.0       Max.   :24.2   Max.   :24.3   Max.   :23.7  
##                                                                 
##       Day5           Day7           Day9          Day10     
##  Min.   :13.1   Min.   :12.0   Min.   :16.1   Min.   :16.1  
##  1st Qu.:17.8   1st Qu.:18.4   1st Qu.:18.7   1st Qu.:18.6  
##  Median :19.5   Median :19.8   Median :19.9   Median :20.1  
##  Mean   :19.5   Mean   :20.0   Mean   :20.2   Mean   :20.2  
##  3rd Qu.:21.5   3rd Qu.:22.2   3rd Qu.:21.2   3rd Qu.:21.6  
##  Max.   :23.9   Max.   :24.2   Max.   :27.5   Max.   :24.3  
##                                NA's   :1      NA's   :1     
##      Day14          Day16          Day22          Day31     
##  Min.   :16.4   Min.   :16.3   Min.   :17.3   Min.   :18.4  
##  1st Qu.:19.4   1st Qu.:19.3   1st Qu.:20.4   1st Qu.:20.3  
##  Median :20.9   Median :21.0   Median :21.9   Median :22.4  
##  Mean   :21.0   Mean   :20.9   Mean   :21.9   Mean   :22.1  
##  3rd Qu.:22.9   3rd Qu.:22.5   3rd Qu.:23.6   3rd Qu.:23.7  
##  Max.   :26.6   Max.   :25.0   Max.   :26.2   Max.   :26.5  
##  NA's   :1      NA's   :1      NA's   :1      NA's   :1     
##      Day32      IP_clinda_day_41 VPI_infection_day_42     Day43     
##  Min.   :18.5   Min.   :18.7     Min.   :18.3         Min.   :16.0  
##  1st Qu.:20.2   1st Qu.:20.4     1st Qu.:20.2         1st Qu.:19.9  
##  Median :22.5   Median :23.0     Median :22.5         Median :22.6  
##  Mean   :22.3   Mean   :22.4     Mean   :22.3         Mean   :21.9  
##  3rd Qu.:23.9   3rd Qu.:24.5     3rd Qu.:24.0         3rd Qu.:23.8  
##  Max.   :27.0   Max.   :26.8     Max.   :27.0         Max.   :26.7  
##  NA's   :1      NA's   :1        NA's   :1            NA's   :1     
##      Day44        pBLweight        CFUDay1            CFUDay4        
##  Min.   :15.8   Min.   : 81.4   Min.   :       0   Min.   :1.00e+02  
##  1st Qu.:19.6   1st Qu.: 85.4   1st Qu.:     100   1st Qu.:1.00e+02  
##  Median :20.1   Median : 89.6   Median : 2900050   Median :5.00e+05  
##  Mean   :20.4   Mean   : 92.0   Mean   :16418794   Mean   :1.94e+07  
##  3rd Qu.:21.4   3rd Qu.: 98.7   3rd Qu.:23000000   3rd Qu.:3.50e+07  
##  Max.   :24.5   Max.   :105.5   Max.   :85000000   Max.   :1.03e+08  
##  NA's   :1      NA's   :1                                            
##     CFUDay7            CFUDay10           CFUDay16          CFUDay22      
##  Min.   :     100   Min.   :     100   Min.   :    100   Min.   :    100  
##  1st Qu.:     100   1st Qu.:     100   1st Qu.:    100   1st Qu.:    100  
##  Median :     100   Median :     100   Median :    100   Median :    100  
##  Mean   : 6056309   Mean   : 1643169   Mean   : 825558   Mean   : 218784  
##  3rd Qu.: 4425000   3rd Qu.:  220000   3rd Qu.: 295050   3rd Qu.:   5050  
##  Max.   :56000000   Max.   :31000000   Max.   :9000000   Max.   :2800000  
##                     NA's   :3          NA's   :1         NA's   :1        
##     CFUDay31         CFUDay40        infection_status
##  Min.   :   100   Min.   :    100   cleared  : 9     
##  1st Qu.:   100   1st Qu.:    100   colonized: 8     
##  Median :   100   Median :    100   control  :14     
##  Mean   : 17558   Mean   : 165881   NA's     : 1     
##  3rd Qu.:  4050   3rd Qu.: 101550                    
##  Max.   :290000   Max.   :3200000                    
##  NA's   :1        NA's   :1

###Figure 8. Discussion and Data Visualization Modifications The purpose of figure 8 is to demonstrate that adaptive immunity does not play a role in C. diff infection clearance. This figure shows the CFU (mean +/-SEM) for each cage of RAG or WT mice over time after infection with C. diff strain 630 or mock infection. There are 6 total cages, each cage has between 4 and 8 mice, and cages house either WT or RAG mice.

These numbers are small, and I think that displaying the mean+/- SEM colonization by cage takes away from the interpretation of this data. Visually it looks as though colonization is dependent on cage effects. This is likely true, but it is not important for the purpose of this figure. Displaying the data by cage takes the emphasis of the figure away from the main point, which is that there is variation in colonization outcomes that is consistent across RAG vs WT mice.

I think that the data would be expressed better as individual mice since the numbers are small. Additionally separate symbols for RAG vs WT genotype visually demonstrate that these genotypes do not impact colonization. Furthermore, a distinction between mock and actual infection using a line at the limit of detection will allow easier and more clear interpretation of these data.

Also, a chi-square test will provide a stronger argument that clearance of infection does not depend on adaptive immunity.

plot of chunk unnamed-chunk-2

## Warning: Chi-squared approximation may be incorrect
## 
## 	Pearson's Chi-squared test
## 
## data:  mouseinf_geno$infection_status and mouseinf_geno$genotype
## X-squared = 0.1075, df = 2, p-value = 0.9477

###Figure 9. Discussion and Data Visualization Modifications In figure 9, Jhansi grouped WT and RAG mice by their C. diff status at the time of VPI infection (naive, previously 630 colonized and cleared infection, 630 colonized), and plotted this against the percentage of baseline weight (a measure of mouse illness) post VPI infection.

This figure demonstrates that mice that are infected by a less virulent strain at the time of infection are protected from CDI illness once infected by a lethal strain, and that adaptive immunity does not modify this effect.

This figure was created in Prism and there are some issues with how the data is displayed that detract from its impact. Frist, healthy mice should lose the least amount of weight. In this graph, colonized mice have the highest percent of baseline weight (ie their weight has not changed), but it looks as though there is a larger effect in this group because that bar is the highest. Second, this graph is supposed to be a box and whisker plot, however it looks like a bar chart with error bars and many outliers because the first quartile hits the x axis. These issues have inspired me to display this figure differently in a way that is easier to understand.

Also, I am not a fan of asterisk and bars across graphs to display significance because I think that they are distracting and lead the viewer to pay attention to the asterisk rather than the data itself. I have decided to perform pariwise t-tests to compare these groups of mice and then report the statistical test and p-value in a subtitle.

plot of chunk unnamed-chunk-3

## $cleared
##          mean   quantile.0%  quantile.25%  quantile.50%  quantile.75% 
##         91.32         84.33         89.63         91.67         92.31 
## quantile.100%        median            sd 
##         97.99         91.67          3.78 
## 
## $colonized
##          mean   quantile.0%  quantile.25%  quantile.50%  quantile.75% 
##        99.311        81.443        98.974       101.269       104.097 
## quantile.100%        median            sd 
##       105.291       101.269         7.673 
## 
## $control
##          mean   quantile.0%  quantile.25%  quantile.50%  quantile.75% 
##        88.172        82.927        84.515        85.848        87.638 
## quantile.100%        median            sd 
##       105.473        85.848         6.922
## 
## 	Pairwise comparisons using t tests with pooled SD 
## 
## data:  mousecfu$pBLweight and mousecfu$infection_status 
## 
##           cleared colonized
## colonized 0.0318  -        
## control   0.2598  0.0015   
## 
## P value adjustment method: holm

##Discussion In this project, I have replicated analysis of Jhansi's mouse colonization data. I have performed statistical tests and created new graphics that display the data in a way that I think is easier to understand.

Jhansi and I discussed this project as well as the new plots that I created. We both thought it was interesting that I, as someone who had never before seen this data, thought different things should be emphasized than she did as the person who actually performed the experiments and collected the data.

Additionally, this project taught me that it is difficult to analyze someone elses dataset when you have no idea what variables mean and how it is organized. Working on this has given me ideas on how to better anotate and store (such as keeping variables in long format vs wide format) my own data so that it will be easier to analyze and to share with collaborators.

##Sources

  1. CDC.gov C. diff

project1's People

Contributors

swhitefi avatar pschloss avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.