dhimmel / lincs Goto Github PK
View Code? Open in Web Editor NEWLibrary of Integrated Cellular Signatures L1000
Home Page: https://think-lab.github.io/d/43/
Library of Integrated Cellular Signatures L1000
Home Page: https://think-lab.github.io/d/43/
I can't figure out where to get modzs.gctx, which is needed to construct the signature dataframe sig_expr_df
in consensi.ipynb
. From here, you say:
The z-score signature vectors are retrieved from the
/xchip/cogs/data/build/a2y13q1/modzs.gctx
file on the C3 cloud.
But this was 2 years ago and the link doesn't work anymore. Also, I'm not exactly sure what this file is exactly or how it was generated.
I appreciate your help in advance!
I am trying to plot some genes using data level 4 for my compound (BRD-K55591206) on HepG2 cells.
There are two signatures with HepG2 cells at level 5:
LJP008_HEPG2_24H:J01
POL001_HEPG2_24H:J09
To make sure I was using the same data from these level 5 signatures I checked the replicates at level 4 of each of these signatures above. The average of the two LJP008 experiments (distil_ids: LJP008_HEPG2_24H_X2_B20:J01|LJP008_HEPG2_24H_X3_B20:J01) matches the signature of each gene at level 5. Perfect.
However, the level 4 data for signature POL001_HEPG2 (distil_ids: POL001_HEPG2_24H_X1.L2_B23:J07|POL001_HEPG2_24H_X2.L2_B23:J07|POL001_HEPG2_24H_X3.L2_B23:J07) does not match level 5.
If we use the NAT2 gene as an example, we have the following level 5 value: 0.004413
On the other hand, the values for the level 4 replicates are:
POL001_HEPG2_24H_X1.L2_B23:J07 = -0.386299998
POL001_HEPG2_24H_X2.L2_B23:J07 = 0.110600002
POL001_HEPG2_24H_X3.L2_B23:J07 =0.38409999
The avg 0.036133 does not match level 5 0.004413
The compound is BRD-K55591206, 10 µM, 24 h.
Why don’t they match?
I am using cmapR to retrieve the data from these files:
https://clue.io/releases/data-dashboard
https://s3.amazonaws.com/macchiato.clue.io/builds/LINCS2020/level5/level5_beta_trt_cp_n720216x12328.gctx
https://s3.amazonaws.com/macchiato.clue.io/builds/LINCS2020/level4/level4_beta_all_n3026460x12328.gctx
https://s3.amazonaws.com/macchiato.clue.io/builds/LINCS2020/siginfo_beta.txt
https://s3.amazonaws.com/macchiato.clue.io/builds/LINCS2020/instinfo_beta.txt
Hi Daniel,
thank you very much for sharing this work. As a computational biologist, this data seems very interesting for lookup of hypothesis won in another dataset in a wet lab data, great!
I had a look at the datasets you kindly provided in https://github.com/dhimmel/lincs/tree/gh-pages/data/consensi and checked the effect of overexpression/underexpression of a gene as perturbagen on itself:
About a third of the genes showed nominal significant (z score <= -1.96) underexpression when it was itself the repressing perturbagen. When looking on overexpression, about 10 percent of genes showed overexpression when they were the overexpressed perturbagen itself.
My first question is: While this is truly a clear enrichment in the right direction, is this rather low efficiency of a gene as perturbagen on itself expected?
My second question is: Do you suggest to filter for genes that have an effect as perturbagen on itself for quality control?
To illustrate this issue, here is a histogram of z-scores showing effect as perturbagen on itselves vs. effect on other genes:
Thanks and best, Holger
Hello Daniel,
I am in the process of creating auto-update scripts for all the nodes with hetio, and in order to do that, I will need a copy of the most up-to-date modzs.gctx file. I know that GSEA has some LINCS datasets in there, and I was curious which files best correspond to the modzs file that you used in this repository. Any input or feedback would be greatly appreciated. Thank you!
Best,
Krish
Hey @dhimmel,
Thank you for such amazing work putting together the scripts to process and analyze the Lincs dataset. If it is not too much, could you add a README to the repo to guide us through the process?
Thank You.
Regards,
Yojana Gadiya
As shown in database.ipynb, there is a large-size l1000.db file, containing the gene expression profiles and meta data of lincs. Here could you publish the code to produce the l1000.db and/or the l1000.db itself? Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.