Coder Social home page Coder Social logo

cailab-tamu / sctenifoldknk Goto Github PK

View Code? Open in Web Editor NEW
36.0 36.0 6.0 2.16 GB

R/MATLAB package to perform virtual knockout experiments on single-cell gene regulatory networks.

R 77.89% Python 21.33% MATLAB 0.56% Shell 0.22%
functional-genomics gene-function gene-knockout gene-regulatory-network virtual-knockout-experiments

sctenifoldknk's People

Contributors

dosorio avatar jamesjcai avatar qianxu05172019 avatar yanzhong07 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sctenifoldknk's Issues

Error in dim(mat) <- new_modes : the dims contain missing values

hi.
I performed scTenifoldKnk using count data from seurat object as input matrix.
However, after reaching 100%, I could not go to the next step and this error occurred.
It was confirmed that there is no NA value in the input matrix.
Can you please let me know how I can fix this?

image

Thank you.

Question: How to run scTenifoldKnk on one cluster

Hello sir,
I would like to ask if you can guide me on how to run your tool on one cluster. Am I supposed to remove genes from the subsetted cluster? I tried to run the tool using the normalized count (using Seurat log-normalize method) for the subsetted cluster, but it is not giving me any significant results. I also tried to run it on the counts, it is taking a lot of time (~15 hours) and it fails by returning the following error: Error in dim(mat) <- new_modes : the dims contain missing values .

I appreciate any help and guidance.
Thank you in advance!
Abed.

subscript out of bounds

Hi Daniel,

It is me again. I have tested knocking out several genes from my datasets and most of them work well except for this one as attached. The function seems to be completed successfully, but one error occurs for the output. Is there any way that we can increase the size of the output matrix for this issue?

Sorry to bother you again on this.

thanks

Leon

"|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=06h 10m 43s
|================================================================================================================================| 100%
Error in [<-(*tmp*, gKO, , value = 0) : subscript out of bounds"

distinguish between up- and down- regulated genes

Hi Daniel,

The virtual KO results really make sense for my dataset and I appreciate for the benefits provided by it.

One more question here is about the FC results. Is it possible to distinguish between the positive and negative FCs of differential expressed genes? That may provide more information to us.

thanks

Best,
Leon

Systematic KO code

Hello,
Thank you very much for this vision.
Would you please direct me to the code used for the systematic KO analysis. I generated a gene_barcode expression matrix downloaded from 10x genomics and would like to use your approach for all genes present (~18k detected).

gko

Hi team of scTenifoldKnk,

Thanks for providing this solution of virtual gene knockout.

While I am following the scTenifoldKnk.R, I have issues to run the knockout function due to no object of "gKO" . I am not sure how to create this object. Does it mean the gene we want to knockout out?

Sorry, I am new to this area and any help is highly appreciated.

thanks

Leon

Gene of interest not appearing knocked down.

Thank you very much for this great tool. I ran the Python version of scTenifoldKnK with default settings but I realise that after knocking my gene of interest, statistical analyses do not show significance in the knock out of the gene opf interest, i.e. the adjusted.p.value for the gene of interest is 1. Please what could explain this?

Code:

from scTenifold.data import get_test_df
from scTenifold import scTenifoldNet

from scTenifold.data import get_test_df
from scTenifold import scTenifoldKnk

sc = scTenifoldKnk(data=df,
                   ko_method="default",
                   ko_genes=["Il17rb"],  # the gene you wants to knock out
                   qc_kws={"min_lib_size": 10, "min_percent": 0.001},
                   )
result = sc.build()

The result changes every time, is it normal??

Hi,
First of all, thank you for developing a good package.

I have one question during the package operation
After virtual perturbation with scTenifoldKnk
Is it normal to change every time when looking at genes perturbed with
unique(c(gKO, X$diffRegulation$gene[X$diffRegulation$p.adj <0.05]))??

The first result is like this
"Hnf4a" "Cyp7b1" "Mep1b" "Napsa" "Slc7a13" "Slc22a19" "Akr1c14" "Cyp2a4" "Cd36" "Slc5a8" "Acsm3" "Slc22a7" "Lpl"
"Mep1a" "Kap" "Tmigd1" "Slc6a18" "Atp11a" "Slc22a13" "Cyp2d9" "Acy3" "Nudt19" "Mpv17l" "Rnf24" "Slc27a2" "Slc5a10"
"Gstm1" "Bdh1" "Ces2c" "Cndp2" "Ggt1" "BC035947" "Slco1a1" "Agps" "Ly6a" "Mogat1" "Ace" "Cyp4a10" "Slco3a1"
"Aadat" "Ehhadh" "Ugt3a1" "Chpt1" "Cyp4b1" "Cyp2e1" "B4galt5" "Bhmt2" "Slc22a6" "Slc22a2" "Gclc" "Slc22a12" "Myo5a"

The second result is like this
"Hnf4a" "Cyp7b1" "Mep1b" "Slc22a19" "Akr1c14" "Slc22a7" "Cyp2a4" "Napsa" "Slc7a13" "Slc22a13" "Rnf24" "Cd36" "Acsm3" "Cyp2d9" "Tmigd1" "Slc5a8" "Mep1a" "Lpl" "Bdh1"
"Kap" "Slc6a18" "Gstm1" "Mpv17l" "Cyp4a10" "Atp11a" "BC035947" "Ace" "Agps" "Ces2c" "Mme" "Myo5a" "Slc27a2" "Slco1a1" "Nudt19" "Acy3" "Ghr" "Aadat" "Slc5a10"
"Ugt3a1" "Ggt1" "Pou3f3" "Slco3a1" "Mogat1" "Gclc" "Ehhadh" "Cdo1" "Chpt1" "Hsd11b1" "Reep5" "Dmgdh" "B4galt5"

and the third also came out differently...

thank you!

The dims contain missing values?

[Dear dosorio

Thank you for your provide convenience.

When I used scTenifoldKnk, the following errors appeared in some experiments, but I checked the matrix and found that there was no missing values.
Do you have any solutions?

Warning message:
In asMethod(object) :
sparse->dense coercion: allocating vector of size 1.4 GiB
Error in dim(mat) <- new_modes : the dims contain missing values
Calls: scTenifoldKnk ... -> cpDecomposition -> rs_unfold -> unfold
In addition: There were 21 warnings (use warnings() to see them)
Execution halted

Gratefully,
siteng](url)

Guidance to set nc_nCells and nc_nComp parameters

Hi @dosorio

I have a single cell dataset containing 30K cells and am performing KO of a few genes. I wish to understand how the choice of nc_nCells and nc_nComp would affect my results. Currently, I set nc_nCells=1000 but I don't know if that should be sufficient when you have 30K cells. In addition, should I increase nc_nComp? Computational power is not the limitation in my case, neither is the time.

is it possible to speed up the scTenifoldKnk

Hi the scTenifoldKnk Team,

Thanks for developing this package. While I have tested the functions of scTenifoldKnk using the example dataset, I have issues to play with my own data matrix.

It is pretty slow for a matrix with about 20,000 genes. I can not make it to work with 2000 cells and it still takes about 16 hours for a matrix with less than 100 cells.

Do you have any suggestions to speed it up? By the way, I am working in R.

thanks

Leon

Error in `[.data.frame`(countMatrix, , selectedCells) : undefined columns selected

Hi,

I was trying to run scTenifoldKnk on a single cell RNA seq dataset that we generated in our lab. But when I read in the file, it keeps giving me the following error:
Error in [.data.frame(countMatrix, , selectedCells) :
undefined columns selected

This is the code that I used to import the dataset.

scRNAseq_Kuang <- read.csv("Kuang1.csv", row.names = 1)
Kuang <- na.omit(scRNAseq_Kuang)
scTenifoldKnk(countMatrix = Kuang, gKO = 'XXX', qc_minLSize = 0)

I also included the screenshot of the format of the dataset

Screen Shot 2021-11-16 at 7 52 13 PM

Any help would be appreciated if you can help me identify what had gone wrong?
Thank you very much in advance.

Best regards
Candice

Stuck at 0% - Calculating for hours

Hi
I am using a dataset imported from Seurat and then converted into count matrix that meets specified parameters (). I get no error, it is just stuck. Has anyone else encountered this issue? Any suggestions from devs?

Thank you so much!

Regarding Log2FC value

Dear developers

Thank you for developing this wonderful tool. Thanks to you, we were able to obtain interesting results using the tool you developed.
However, I have a question regarding FC (Foldchange), the result obtained after Virtual KO analysis.

Is it theoretically okay to transform the FC value into log2FC by adding log2 ((ex) log2FC <- log2(FC))?
If it is okay to transform the value into log2FC, is it okay to classify it as an up-regulated gene or a down-regulated gene?

(ex)
log2(FC) > 1 = Up regulatory genes
log2(FC) < -1 = Down regulatory genes

Thank you

Questions about running scTenifoldKnk

Hello there,
I would like to ask a few questions about scTenifoldKnk: is it mandatory to run the tool on an entire sample? Is it possible to run it on a specific cluster in a single-cell sample after clustering and annotation? What do you recommend to use: normalized or not normalized counts?

Thank you in advance.

remove unnecessary output network of KO

outputList$tensorNetworks$KO <- Matrix(KO)
Is this necessary? Matrix(KO) and Matrix(WT) only differ by one row, but will take same amount of disk/memory space.

Using scTenifoldKnk for other organism than Human

Hi Developers!!

I am working with plasmodium and I wish to knockout a gene and see its effect. But in Plasmodium the mitochondrial genes start with "milo" regex.

  1. Could you inculcate this option to provide the custom regex for mitochondrial genes (Feature request)!! Also do you suggest using it on non-human organism?
  2. Also could you add verbose parameter that prints what's step is running. It will help us know what calculations/ steps are running in the background. All I see right now are two progress bars.
  3. Also is there a Vignette that highlights do's and dont's while using this single function like what parameters are essential (eg: nc_lambda and others) and how to explore the result object to get into the insights of the knockdown?

For RNA-seq data

Hi,
thank you for developing a good package.

I have the strange question. Can i adapt this package for RNA-seq data or it can be used only for single-cell?

Am I doing correctly?

Hi, first, your virtual KO seems a very interesting idea!

I'm curious about how the scTenifoldKnk works, so I tried it with the test data provided in the package.

scRNAseq <- system.file("single-cell/example.csv",package="scTenifoldKnk")
scRNAseq <- read.csv(scRNAseq, row.names = 1)

I expected it will generate a heatmap like below,

image

but, I get those for WT and KO(knockout g100)
wt
ko

Am I doing correctly?

suggestion on filtering genes of interest

Hi @dosorio

I want to ask if you have any recommended way of filtering the DEGs that we get out of scTenifoldKnK. I observed that there are genes which have more than 10 fold perturbation such as the following genes:

gene	distance	Z	FC	p.value	p.adj
PF3D7-0617800	1.73822398699258E-06	6.06037047499314	2917.01596234589	0	0	late trophs
PF3D7-0402000	1.63063171540528E-06	4.28111337895561	2032.22476480867	0	0	early trophs

Besides, do you have any recommendations to shortlist genes of interest based on Z-scores as well?

Data normalized or not?

Hi, thanks for your contribution. Congratulations
I have two doubts.

  1. You wrote different information in two places. I wanted to know if there is any error in any of them.
    In the article: The input expression matrix is assumed to have been properly normalized.
    Gituhub: Data is expected to be not normalized.
  2. I also made a KO and the FDR values came out of 0.999. Is there the possibility of choosing genes not by the FDR, but for fold change for example greater than 0.5?

Thanks for your time and help!

Applicability for KO of lncRNAs

Hi !
Thank you for the amazing tool. I am curious if scTenifolfKnk was tested for KO of lncRNAs and how well it would work. Insights into this or suggestions about any tool that could study the functions of lncRNAs would be great.

Regards,
Prakrithi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.