Coder Social home page Coder Social logo

daesc's People

Contributors

gqi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

sachinkavindaa

daesc's Issues

Identifying imprinted genes from reciprocal cross and scRNA-seq data

I am impressed by your tool and would like to leverage its capabilities in my research. Using a reciprocal mouse strain cross, I have generated stem cell lines from two distinct rat strains with characterized SNPs. I have performed scRNA-seq on these cell lines from both reciprocal crosses. I am specifically interested in identifying imprinted genes within this dataset using your DAESC algorithm. Can you please advise on the best approach to integrate my SNP and scRNA-seq data into your tool for this purpose?

Thank you

Specifying the design matrix for differential ASE

Hello DAESC dev team!

First of all, thank you for developing such flexible framework to test for allelic imbalance.

I am trying to incorporate DAESC into my analysis pipeline that tests for differential ASE across multiple individuals accounting for the disease status (condition), spatial location (cortical layers) and cell type.

To begin with, I applied DAESC on a toy dataset from one gene, two conditions, and two individuals per condition. Here, I fit the baseline model (DAESC-BB) specifying the design matrix x as a binary numeric array denoting the condition.

Now, I want to extend this by also taking into account the spatial information and the interaction terms. Since both condition and spatial location as in cortical layers are categorical variables, I tried using model.matrix function to encode the information in one-hot matrix. Here is the structure of the data frame I am working with and how I am invoking the daesc_bb function.

str(cur.df)
'data.frame': 3046 obs. of 8 variables:
$ gene : chr "HES6" "HES6" "HES6" "HES6" ...
$ barcode : chr "CTCTCTAACTGCCTAG" "TCGGCGTACTGCACAA" "GGGCAGGATTTCTGTG" "CGGTTCCGGCTTCTTG" ...
$ allele1_count: num 1 1 1 1 1 1 0 3 0 1 ...
$ allele2_count: num 1 0 0 0 1 1 1 0 1 0 ...
$ total_count : num 2 1 1 1 2 2 1 3 1 1 ...
$ condition : Factor w/ 2 levels "HC","PD": 1 1 1 1 1 1 1 1 1 1 ...
$ sample_id : chr "BN0339" "BN0339" "BN0339" "BN0339" ...
$ layer : Factor w/ 7 levels "Layer 1","Layer 2",..: 7 4 7 4 5 3 6 6 5 6 ...

myformula = ~ cur.df$condition + cur.df$layer + cur.df$condition:cur.df$layer + 0
one_hot = model.matrix(myformula)

str(one_hot)
num [1:3046, 1:14] 1 1 1 1 1 1 1 1 1 1 ...
- attr(, "dimnames")=List of 2
..$ : chr [1:3046] "1" "2" "3" "4" ...
..$ : chr [1:14] "cur.df$conditionHC" "cur.df$conditionPD" "cur.df$layerLayer 2" "cur.df$layerLayer 3" ...
- attr(
, "assign")= int [1:14] 1 1 2 2 2 2 2 2 3 3 ...
- attr(*, "contrasts")=List of 2
..$ cur.df$condition: chr "contr.treatment"
..$ cur.df$layer : chr "contr.treatment"

res = daesc_bb(y=cur.df$allele1_count, n=cur.df$total_count, subj=cur.df$sample_id, x=one_hot)

When I run this as is, I get the following error:

fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
cur.df.conditionPD
NA
Error in aod::betabin(cbind(y, n - y) ~ ., random = ~1, data = data.frame(y, :
Initial values for the fixed effects contain at least one missing value.

I think the reason is in the design matrix, where the first two columns have essentially the same information. After dropping the first column in one_hot, it runs without error, but I want to double check whether this is the intended use of this variable when supplying multiple categorical variables.

Finally, I also want to add cell type information as another independent variable. In my case, cell type is not a categorical variable, since the data was generated using Visium. So for each cell type, the input data will be its estimated abundance. If I were to input all (1) condition, (2) layer and (3) cell type into x, how should I structure it?

Thank you!

Skewed p-value distribution

Hello DAESC team,

I have a question about interpreting the result using the DAESC-MIX model. I have tested ~10000 genes for allelic imbalance among three conditions. Then I plotted a histogram of p-values.

Looking at the p-value distribution, it is very skewed, identifying almost all genes as significant.
Can you share your insights on this?

The dataset used here is 10X Visium.

Thank you!

Pre-processing the data from 10xchromium

Hello,

I am now struggling to generate the base quality score table for the bam file generated from cellranger pipeline. As the cellranger mapped the sequencing file to grch38 genome, the vcf file from gatk resource bundle is not compatible. While the vcf files from 1000 genome project are doable, they are splited into different chromosome and each file take up a bit of storage place. Is there any suggestion to get around with this problem or if there's any way to merge the base quality score table generated for each chromosome into one?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.