lieberinstitute / spatialdlpfc Goto Github PK
View Code? Open in Web Editor NEWspatialDLPFC project involving Visium (n = 30), Visium SPG (n = 4) and snRNA-seq (n = 19) samples
Home Page: http://research.libd.org/spatialDLPFC/
spatialDLPFC project involving Visium (n = 30), Visium SPG (n = 4) and snRNA-seq (n = 19) samples
Home Page: http://research.libd.org/spatialDLPFC/
Read in the scale factors at
spatialDLPFC/analysis/1_marker_genes.R
Lines 62 to 64 in a38bef5
spatialDLPFC/analysis/1_marker_genes.R
Lines 46 to 50 in a38bef5
It would be similar to https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L105
Once #39 is done, we can proceed with this.
Overall file structure ideas https://lcolladotor.github.io/bioc_team_ds/organizing-your-work.html
Live examples: https://github.com/LieberInstitute/Visium_IF_AD and the newer https://github.com/LieberInstitute/DLPFC_snRNAseq
See more on Slack at https://jhu-genomics.slack.com/archives/C01EA7VDJNT/p1630516674026100
Current status:
$ ls -lh /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC
total 234K
drwxrws--- 2 lcollado lieber_lcolladotor 31 Jul 28 10:47 analysis
drwxrws--- 14 lcollado lieber_lcolladotor 14 Nov 25 2020 FASTQ
drwxrws--- 2 lcollado lieber_lcolladotor 2 Nov 18 2020 html
drwxrws--- 8 lcollado lieber_lcolladotor 9 Jul 14 13:25 Images
drwxrws--- 4 lcollado lieber_lcolladotor 16 Jul 23 16:53 images_raw_align_json
drwxrws--- 4 lcollado lieber_lcolladotor 4 Feb 17 2021 outputs
drwxrws--- 6 lcollado lieber_lcolladotor 35 Aug 3 09:08 plots
drwxrws--- 3 lcollado lieber_lcolladotor 5 Mar 22 12:33 rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 1.8K Nov 18 2020 README.md
drwxrws--- 2 lcollado lieber_lcolladotor 5 Jul 26 12:27 sample_info
drwxrws--- 3 lcollado lieber_lcolladotor 3 Jul 26 11:01 scripts
-rw-rw---- 1 lcollado lieber_lcolladotor 270 Feb 15 2021 spatialDLPFC.Rproj
More specifically:
raw-data
, code
, processed-data
analysis
to code/
FASTQ
to raw-data/
html
for now, can be remade under processed-data
later, like in sub-directories as in https://github.com/LieberInstitute/DLPFC_snRNAseq/tree/main/processed-data/02_cellranger_metricsImages
to raw-data/
images_raw_align_json
to raw-data/
rdata
to processed-data
, maybe even to sub-directories like https://github.com/LieberInstitute/DLPFC_snRNAseq/tree/main/processed-data/02_cellranger_metricssample_info
to raw-data/
scripts
to code/
This involves input from Maddy and others working with Matlab. Once we have the estimated number of cells per spot from Joe's code, we can then read in the data and add it to our sce
object using https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L237-L242. Then in #8 we can see it in shiny
by including it in the sce_continuous_vars
as in https://github.com/LieberInstitute/spatialLIBD/blob/master/R/run_app.R#L86.
if this doesn't work then we try miniBatch k-means clustering https://bioconductor.org/packages/release/bioc/html/mbkmeans.html
For that step, you'll likely need to use code from https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/convert_sce.R#L19-L20. Note that the tibbles for the images have to be combined which involves https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L151 and https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/convert_sce.R#L20.
#5 could be done before or after the cbind()
step.
You might have to manually create additional columns for the colData(sce)
such as subject
, position
, replicate
, subject_position
although that is study-dependent. For example, sce$position
gets created at https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/convert_sce.R#L58-L59.
You can overwrite the current SpaceRanger results we have.
After #7, to try out the new sce
object with spatialLIBD::run_app()
we need to do a couple of things.
pryr::object_size(sce)
at JHPCEsce
object as well as the low resolution imagesBiocManager::install(version = "3.12")
+ BiocManager::valid()
)spatialLIBD
(BiocManager::install("spatialLIBD")
)Next, we can try to run the app using spatialLIBD::run_app()
that has the following arguments https://github.com/LieberInstitute/spatialLIBD/blob/master/R/run_app.R#L49-L93.
We'll need to override several of them (aka, not use the defaults). In particular:
sce
image_path
: should be a local directory containing the low resolution images, similar to https://github.com/LieberInstitute/spatialLIBD/tree/master/inst/app/www/data where each directory is the sample name and inside you can find the tissue_lowres_image.png
files like at https://github.com/LieberInstitute/spatialLIBD/tree/master/inst/app/www/data/151507sce_discrete_vars
(we don't really have any of them right now)sce_continuous_vars
(we have most, I don't think that we have cell_count
yet, though maybe Maddy has run the image segmentation Matlab code already)spatial_libd_var
which we might need to create with no data for now (as in sce$leo <- NA
) since we don't have any clusters yetThen, the app should ideally (we'll see if there's code that needs to be changed) work on the "spot-level" tab. The "layer-level" tab will still show the older data.
See the commit message at LieberInstitute/spatial_hpc@1d32b1e for details. See @madhavitippani for more info too.
do this before we run clustering.
then run bayesspace on harmony dimensions instead of PCs
get it from the CS2 computer and transfer it to the cluster so heena can divide it and run vistoseg
Maybe move some of the older scripts from https://github.com/LieberInstitute/spatialDLPFC/tree/c61820105268869abff3cbfa029e97827b670073/code/analysis to old
or something like that. That way the new code from #41 and #42 will be easier to identify.
Similar to #9, lets add to the sce
object the reduced dimensions using the data from all the images in our sce
object prior to #8 & #7. That involves running code Abby is familiar with already from https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L80-L230.
Note that that code uses block = sce$subject_position
which will be study-specific. If we have samples from the same donors, I would use block
with something like block = subject
. If we have spatial replicates, then we should use the original block = sce$subject_position
(which you'll need to create first). This block
argument is our way of telling scran
our data structure (like the fact that our spots from a given set of images).
This is a more general issue and can be done one script a time.
Example here
usage: https://github.com/LieberInstitute/DLPFC_snRNAseq/blob/main/code/02_cellranger_metrics/01_Tran_et_al_metrics.R
spe$quadrant <- if(coord_x < 0 & coord_y > 0) "topleft" else if(....) spe$sample_quadrant <- paste0(spe$sampleid, "_", spe$quadrant) sample_quad_list <- rafalib::splitit(spe$sample_quadrant) selected_sample_quad <- unlist(lapply(sample_squad_list, sample, n = 250)) spe_subsampled <- spe[, selected_sample_quad]
Instead of the column names at
spatialDLPFC/analysis/1_marker_genes.R
Lines 49 to 50 in a38bef5
SpatialExperiment
) use the older names as in https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L116 then do the same multiplications (using the scaled factors info from #1) https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L118-L119.
Eventually, we want this info in the colData(sce)
slot which Abby is doing at
spatialDLPFC/analysis/1_marker_genes.R
Line 83 in a38bef5
spatialDLPFC/analysis/1_marker_genes.R
Lines 71 to 78 in a38bef5
Hi Abby,
Can you make the plots for the marker genes? That will involve updating https://github.com/LieberInstitute/spatialDLPFC/blob/32a14108b4ccc2210f1f8f63857d51f4847f6994/analysis/02_marker_genes.R as well as including the R session info as a comment at the bottom of the script.
It'll probably be good so you can practice using the new spatialLIBD
version with functions such as vis_clus_gene()
. Check also the git/GitHub history for that file to check how I tried to simplify the code. I haven't tested it myself, so there might be a typo here or something missing here or there.
Thanks!
running it on sample_id will remove any variation in region. Find a way to test this and look specifically at whether there is variation in region that we need to worry about.
Once SpaceRanger is done re-running for the 2 problematic samples, to be extra careful, we'll re-run VistoSeg
to count the cells per spot in these 2 samples. They are:
Sample 1 already finished running on SpaceRanger
and you can find the output at /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rerun_spaceranger/DLPFC_Br2743_ant_manual_alignment
.
Sample 4 is queued at JHPCE. Once it's done, the output will be at /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rerun_spaceranger/DLPFC_Br3942_ant_manual_alignment
.
Thanks @heenadivecha!
Run internal scuttle
code and extract the information that leads to the warning at https://github.com/LieberInstitute/spatialDLPFC/blob/main/analysis/01_build_SPE.R#L339-L356 so we can explore this information at https://github.com/LieberInstitute/spatialDLPFC/blob/main/analysis/01a_check_scran_quick_cluster.R as needed.
This is related to LTLA/scuttle#7 where we want to find out if we are using the correct statistics and if the workaround Aaron introduced in scuttle
1.1.15 is appropriate or not.
Once #40 is done continue here.
Likely start a new script for this.
spatialDLPFC/code/analysis/merge_spe.R
Lines 17 to 51 in c618201
read10xVisiumWrapper()
like at spatialDLPFC/code/analysis/merge_spe.R
Line 82 in c618201
spatialDLPFC/code/analysis/merge_spe.R
Lines 124 to 139 in c618201
scran
to have the QC info.HARMONY
and BayesSpace
.HARMONY
dimensions.BayesSpace
results.Before #8 and likely also before #7, we'll want to add a few things to our sce
object. One of them could be a (or more than one) discrete variable https://github.com/LieberInstitute/spatialLIBD/blob/master/R/run_app.R#L58 that tells us what were spots that scran
's quality control checks (designed for scRNA-seq data, not Visium data) are of low quality and should be discarded. That is, running code like https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L32-L33 then taking the outputs (which are logical
vectors) and converting them to discrete variables (so factor
or character
).
Let's try this with the 3 columns in https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L35. So something like:
sce$scran_discard <- factor(edit_me, levels = c("TRUE", "FALSE"))
sce$scran_low_lib_size <- ...
sce$low_n_features <- ...
This might be more relevant for the LC data than the DLPFC data. But it'll be good to check.
From https://github.com/LieberInstitute/spatialLIBD/blob/002838163e6373f4d3899f74fd867c6c6fe91eab/inst/scripts/make-data_spatialLIBD.R add some useful information such as:
chrM
columns https://github.com/LieberInstitute/spatialLIBD/blob/002838163e6373f4d3899f74fd867c6c6fe91eab/inst/scripts/make-data_spatialLIBD.R#L63-L66rowData(sce)$gene_search
which combines the gene symbol and the Ensembl gene ID as in https://github.com/LieberInstitute/spatialLIBD/blob/002838163e6373f4d3899f74fd867c6c6fe91eab/inst/scripts/make-data_spatialLIBD.R#L106-L108At this point, you could use spatialLIBD::check_sce()
and see what else is missing http://research.libd.org/spatialLIBD/reference/check_sce.html.
Use the GTF file we used for aligning the data with SpaceRanger to add more information about the genes. Note that the genes might not be in the same order, but you can rely on the Ensembl Gene ID for ordering the data.
That is, the file at
spatialDLPFC/analysis/1_marker_genes.R
Lines 37 to 40 in a38bef5
map
object made at https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L78-L80 (with different column names, but they could be made consistent). I don't know if the genes are in the same order across all images, but it would be worth double checking.
In any case, the rest of the gene info can be obtained from the annotation GTF file as in https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L81-L87.
Read in the low resolution images and create a tibble for them. That involves code like https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L89-L101. Just use the low resolution file from
spatialDLPFC/analysis/1_marker_genes.R
Line 60 in a38bef5
lo
to low
).After #10, we can run some graph-based clustering algorithms as in https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L452-L468 + https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L1185-L1191 and add them to the sce_discrete_vars
in #8 as in https://github.com/LieberInstitute/spatialLIBD/blob/master/R/run_app.R#L63.
Using the published data from 2021, compare the ARI from SpaGCN
and BayesSpace
(joint clustering). This will likely be a supplementary figure in this paper.
We should also add this info to our published data and its companion apps.
after issue #23
fix sample_info table because it's missing some data
Use marker genes from pilot dataset to call layers in this new dataset
find in OSCA the part where they talk about running PCAs. Make plots they suggest to check this. Check with Matt too to see what he did for his data.
https://github.com/LieberInstitute/spatialDLPFC/blob/main/code/analysis/01_build_SPE_all.R#L444
Do this prior to #16 so we version control files we need prior to moving them around
Once #41 is done, we'll want to run SpaGCN on the 30 samples we'll use for this project.
When reading in the data, add metrics such as sum_umi
and sum_gene
to colData(sce)
using code similar to https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L144-L145.
3 "new" samples
6432_ant_2
8325_mid_2
2720_ant_2
2 resequences
2720post
8667ant
1 missing resequence (emailed Linda)
6423mid
use harmony dimension (instead of PCs) to run bayesspace, do on pilot dlpfc samples https://edward130603.github.io/BayesSpace/articles/joint_clustering.html. harmony seems to the best for batch correction.
try substitute to bayesspace which is called resept. https://www.biorxiv.org/content/10.1101/2021.07.08.451210v2
another alternative to bayesspace is SpaGCN https://www.biorxiv.org/content/10.1101/2020.11.30.405118v1, https://github.com/jianhuupenn/SpaGCN/blob/master/tutorial/tutorial_ez_mode.md#3-read-in-data, https://github.com/jianhuupenn/SpaGCN/blob/master/tutorial/tutorial.md
contingency table and ARI method
SpatialExperiment
won't be backwards compatible from 3.14.Make contingency table
Use ARI method
Move to trash/
any older SPE objects that we likely don't need anymore. The goal is to minimize confusion. You might want to do this before starting with #19.
That likely involves some of the spe
objects listed at
$ ls -lh code/analysis/*rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 28K Feb 10 2021 code/analysis/clusters_more_than_100_umis.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 30K Feb 10 2021 code/analysis/clusters_more_than_10_umis.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 30K Feb 10 2021 code/analysis/clusters_nonzero.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 162M Feb 15 2021 code/analysis/sce_combined.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 163M Jan 28 2021 code/analysis/sce_list.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 162M Feb 10 2021 code/analysis/sce_more_than_100_umis.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 162M Feb 10 2021 code/analysis/sce_more_than_10_umis.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 162M Feb 10 2021 code/analysis/sce_nonzero.rda
## Below we likely want to keep the metrics files, but not the spe or top.hvg ones
$ ls -lh processed-data/rdata/spe/*Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 1.7K Feb 17 2021 processed-data/rdata/spe/pilot_metrics.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 2.7K Jul 27 11:06 processed-data/rdata/spe/sample_metrics_072721.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 5.1K Jul 28 07:16 processed-data/rdata/spe/sample_metrics_all.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 2.4K Feb 17 2021 processed-data/rdata/spe/sample_metrics.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 4.5K Jul 27 12:15 processed-data/rdata/spe/shared_metrics_072721.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 1.6K Feb 17 2021 processed-data/rdata/spe/shared_metrics.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 840M Sep 1 12:34 processed-data/rdata/spe/spe_072821.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 418M Jul 28 15:05 processed-data/rdata/spe/spe_raw_072821.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 168M Feb 17 2021 processed-data/rdata/spe/spe_raw.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 343M Feb 17 2021 processed-data/rdata/spe/spe.Rdata
-rw-rw---- 1 aspangle lieber_lcolladotor 138K Sep 1 11:17 processed-data/rdata/spe/top.hvgs_all.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 114K Feb 17 2021 processed-data/rdata/spe/top.hvgs.Rdata
among other places.
Here's another way of finding these files based on https://stackoverflow.com/a/5905126/9374370:
## .rda files
$ find . -name "*.rda"
./code/analysis/clusters_more_than_10_umis.rda
./code/analysis/clusters_more_than_100_umis.rda
./code/analysis/sce_more_than_100_umis.rda
./code/analysis/sce_more_than_10_umis.rda
./code/analysis/clusters_nonzero.rda
./code/analysis/sce_combined.rda
./code/analysis/sce_nonzero.rda
./code/analysis/sce_list.rda
## .Rdata files
$ find . -name "*.Rdata"
./processed-data/rdata/inspect_scuttle_issue_7.Rdata
./processed-data/rdata/spe/spe.Rdata
./processed-data/rdata/spe/sample_metrics.Rdata
./processed-data/rdata/spe/spe_072821.Rdata
./processed-data/rdata/spe/spe_raw.Rdata
./processed-data/rdata/spe/shared_metrics_072721.Rdata
./processed-data/rdata/spe/top.hvgs.Rdata
./processed-data/rdata/spe/sample_metrics_all.Rdata
./processed-data/rdata/spe/sample_metrics_072721.Rdata
./processed-data/rdata/spe/shared_metrics.Rdata
./processed-data/rdata/spe/pilot_metrics.Rdata
./processed-data/rdata/spe/top.hvgs_all.Rdata
./processed-data/rdata/spe/spe_raw_072821.Rdata
./processed-data/rdata/g_k50.Rdata
I'm guessing some object gets duplicated internally or something. I'm not sure.
directions in OSTA
This work is already in progress thanks to https://github.com/LieberInstitute/spatialDLPFC/blob/main/analysis/sample_metrics.R.
The idea is to visualize some of the spaceranger
sample metrics we have vs the ones we had in the past from https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/visium_dlpfc_pilot_sample_metrics.tsv. This will help us determine if we need to generate more data (do more sequencing).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.