Coder Social home page Coder Social logo

Comments (5)

bmansfeld avatar bmansfeld commented on July 28, 2024

Hi,
This is an issue with the G statistic Im not sure was ever resolved. The equation is taken directly from Magwene et al., 2011. So it's not really my code error or something.
However, I think that in general unless you have amazingly perfect bulking, this rarely happens in real data. And in any case you are smoothing over many SNPs and so they all shouldn't be perfectly zero.
I've thought of adding 0.5 or something to the obs if they are zero but I didn't want to change the method.
Let me know your thoughts.
Ben

from qtlseqr.

huangli2924 avatar huangli2924 commented on July 28, 2024

hi,Ben
Thanks for your so fast reply, And thank you for your code. Many of my friends are using it. It's very practical! I just want to discuss and learning with you.
If I understand correctly, "LowRef" represents the "REF_depth" of a SNP site in the low pool. If any element in obs is equal to zer0 ,such as LowRef=0 , log (0) will occur;
This is my unserstand of the formula ,I am not sure, Am I right?
Thanks

from qtlseqr.

bmansfeld avatar bmansfeld commented on July 28, 2024

Thanks for the comments :-) I hope the package is useful to you all!
This is a good discussion and I've been meaning to raise it with the authors of the original paper (Magwene et al., 2011)
Here's two screenshots from the paper

  1. with the equation for the G statisitc:
    image
  2. With the contingency table for ni 1 -> 4
    image

And my code for G stat:

QTLseqr/R/G_functions.R

Lines 29 to 44 in 5e76137

getG <- function(LowRef, HighRef, LowAlt, HighAlt)
{
exp <- c(
(LowRef + HighRef) * (LowRef + LowAlt) / (LowRef + HighRef + LowAlt + HighAlt),
(LowRef + HighRef) * (HighRef + HighAlt) / (LowRef + HighRef + LowAlt + HighAlt),
(LowRef + LowAlt) * (LowAlt + HighAlt) / (LowRef + HighRef + LowAlt + HighAlt),
(LowAlt + HighAlt) * (HighRef + HighAlt) / (LowRef + HighRef + LowAlt + HighAlt)
)
obs <- c(LowRef, HighRef, LowAlt, HighAlt)
G <-
2 * (rowSums(obs * log(
matrix(obs, ncol = 4) / matrix(exp, ncol = 4)
)))
return(G)
}

The function takes in the allele depth values for the above table (ie ni) and first calculates a vector of the expected values for the denominator (roughly half of the read depth).
It then takes the observed values for each of ni and puts it in the numerator.
Thus if any ni is exactly zero. log0 will occur as you say.

Again as far as I can see this is an imperfection with how Magwene et al implement the G statistic.
Unless my code or interpretation of their formula is terribly wrong (in which case please let me know!)

I do however stand by my previous comments that due to Poisson noise in sequencing and imperfect bulking this is unlikely to happen in large chunks of SNPs. So that these NAN values eventually get ignored during the smoothing process of G.

There are options for zero-substitution procedures but after playing with some of them I found them to affect G in ways that depended on read depth etc. and so I opted to not implement them.

-Ben

from qtlseqr.

huangli2924 avatar huangli2924 commented on July 28, 2024

OK, you are such a responsible guy, ha ha
I understand. Thank you very much.

from qtlseqr.

bmansfeld avatar bmansfeld commented on July 28, 2024

Great! no problem.
I will leave the issue open in case anyone wants to contribute.

from qtlseqr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.