Coder Social home page Coder Social logo

rossibarra / msstats-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from molpopgen/msstats

0.0 2.0 0.0 278 KB

msstats calculates many common popgen summary statistics from the output of the coalescent simulation "ms".

License: GNU General Public License v2.0

Makefile 14.88% Shell 12.85% M4 2.32% C++ 2.80% Roff 0.53% TeX 66.63%

msstats-1's Introduction

msstats - read data from ms via stdin, calculate common summary statistics

Copyright (C) 2002 Kevin Thornton

msstats is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

Comments are welcome.

- Kevin Thornton <[email protected]>

#Usage

[coalesent simulation] | msstats | gzip > stats.gz

"coalescent simulation is assumed to be any program that prints biallelic marker output in the format of Dick Hudson's ms (http://home.uchicago.edu/~rhudson1)

This format looks like this:

//
segsites: 2
positions: 0.1 0.9
01
10

The above is for 2 "SNPs" in a sample of size n = 2

#Installation

./configure

make

sudo make install

##If dependent libraries are in non-standard locations. For example, "/opt":

CXXFLAGS=-I/opt/include LDFLAGS="$LDFLAGS -L/opt/lib" ./configure

make

sudo make install

##Installing somewhere other than /usr/local/bin

./configure --prefix=/path/to/where/you/want/it

For example,

./configure --prefix=$HOME

make

make install

will result in msstats being in ~/bin.

##Notes on calculations for metapopulations.

If the input data contain ordered samples from multiple demes, you may pass that info to msstats as follows:

-I D n0 n1 ... nD,

where D is the number of demes and n0 is the sample size in the first deme, etc.

When -I is used, summary statistics will be calculated for each deme within each replicate. The columns "rep" and "pop" will tell you which deme in which replicate each statistic corresponds to.

The program will exit with an error if the sum of deme sizes does not equal the input sample size read from STDIN.

When the -I option is used, msstats will report Hudson, Slatkin, and Maddison's Fst statistic. The reference for this statistic is:

Hudson, Slatkin and Maddison (1992) Estimation of levels of gene flow from population data. Genetics 132:583-589

For all pairwise comparisions amongst the D demes, you will see output columns with headers hsmij, which is the Fst statistic calculated between demes i and j. These numbers will be repeated for each deme within each replicate. Thus, to get the actual distribution of Fst for a simulation, you should condition on the line of results for just one population. For example:

ms 30 100 -t 10 -I 3 10 10 10 1 | ./src/msstats -I 3 10 10 10 > output

Then, using R,

x=read.table("output",header=T)
mean(x$hsm01[x$pop==0])
[1] 0.3511204

#The output

##For a single population

There are 21 columns in the output:

  1. S = the number of "segregating sites", aka mutations. (Watterson, 1975)
  2. n1 = the number of singletons in the data. This is the number of mutations at frequency 1 and n-1 in the sample.
  3. next = the number of "external mutations" (sensu Fu) = the number of derived singletons.
  4. theta = Watterson's estimate of theta
  5. pi = "sum of site heterogzygosity" = sum of 2pq over the S sites. (Nei, Tajima, others).
  6. thetaH = Fay and Wu's estimator of theta. Their H statistic is pi - thetaH, hence no column for it.
  7. Hprime = Zeng et al.'s normalized Fay and Wu's H.
  8. tajd = Tajima's D
  9. fulif = Fu and Li's F
  10. fulid = Fu and Li's D
  11. fulifs = Fu and Li's F-star
  12. fulifds = Fu and Li's D-star
  13. rm = Hudson and Kaplan's Rmin
  14. rmmg = Myers and Griffiths simplest lowest bound on Rmin
  15. nhap = Number of distinct haplotypes. Depaulis and Veuille
  16. hdiv = haplotype diversity. Depaulis and Veuille
  17. wallB = Jeff Wall's B
  18. wallQ = Jeff Wall's Q
  19. rosarf = Ramos-Onsins and Rozas Rf statistic
  20. rosarf = Ramos-Onsins and Rozas Ru statistic
  21. zns = Kelly's Zns = average pairwise r-squared.

msstats-1's People

Contributors

molpopgen avatar kvnloo avatar

Watchers

James Cloos avatar Jeffrey Ross-Ibarra avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.