The spatialdlpfc's discuss from lieberinstitute

Run BayesSpace on Pilot data with 14 clusters and compare to manual annotations

Update scaleFactors code

Read in the scale factors at

Lines 62 to 64 in a38bef5

    
           # spatial scale factors 
        
           file_scale <- file.path(dir_spatial, "scalefactors_json.json") 
        
           scalefactors <- fromJSON(file = file_scale)

before you read the tissue position info at

spatialDLPFC/analysis/1_marker_genes.R

Lines 46 to 50 in a38bef5

    
           dir_spatial <- file.path(dir_outputs, sample_name, "outs", "spatial") 
        
           file_tisspos <- file.path(dir_spatial, "tissue_positions_list.csv") 
        
           df_tisspos <- read.csv(file_tisspos, header = FALSE,  
        
                                  col.names=c("barcode_id", "in_tissue", "array_row", "array_col",  
        
                                              "pxl_col_in_fullres", "pxl_row_in_fullres"))

since we need it for doing the multiplications at https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L118-L119.

Run VistoSeg for 2720 Ant (round 4)

Once #39 is done, we can proceed with this.

Re-organize files to match new project structure

Overall file structure ideas https://lcolladotor.github.io/bioc_team_ds/organizing-your-work.html

Live examples: https://github.com/LieberInstitute/Visium_IF_AD and the newer https://github.com/LieberInstitute/DLPFC_snRNAseq

See more on Slack at https://jhu-genomics.slack.com/archives/C01EA7VDJNT/p1630516674026100

Current status:

 $ ls -lh /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC
total 234K
drwxrws---  2 lcollado lieber_lcolladotor   31 Jul 28 10:47 analysis
drwxrws--- 14 lcollado lieber_lcolladotor   14 Nov 25  2020 FASTQ
drwxrws---  2 lcollado lieber_lcolladotor    2 Nov 18  2020 html
drwxrws---  8 lcollado lieber_lcolladotor    9 Jul 14 13:25 Images
drwxrws---  4 lcollado lieber_lcolladotor   16 Jul 23 16:53 images_raw_align_json
drwxrws---  4 lcollado lieber_lcolladotor    4 Feb 17  2021 outputs
drwxrws---  6 lcollado lieber_lcolladotor   35 Aug  3 09:08 plots
drwxrws---  3 lcollado lieber_lcolladotor    5 Mar 22 12:33 rdata
-rw-rw----  1 lcollado lieber_lcolladotor 1.8K Nov 18  2020 README.md
drwxrws---  2 lcollado lieber_lcolladotor    5 Jul 26 12:27 sample_info
drwxrws---  3 lcollado lieber_lcolladotor    3 Jul 26 11:01 scripts
-rw-rw----  1 lcollado lieber_lcolladotor  270 Feb 15  2021 spatialDLPFC.Rproj

More specifically:

create raw-data, code, processed-data
move analysis to code/
move FASTQ to raw-data/
delete html for now, can be remade under processed-data later, like in sub-directories as in https://github.com/LieberInstitute/DLPFC_snRNAseq/tree/main/processed-data/02_cellranger_metrics
move Images to raw-data/
move images_raw_align_json to raw-data/
move rdata to processed-data, maybe even to sub-directories like https://github.com/LieberInstitute/DLPFC_snRNAseq/tree/main/processed-data/02_cellranger_metrics
move sample_info to raw-data/
move scripts to code/

Add number of cells per spot

This involves input from Maddy and others working with Matlab. Once we have the estimated number of cells per spot from Joe's code, we can then read in the data and add it to our sce object using https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L237-L242. Then in #8 we can see it in shiny by including it in the sce_continuous_vars as in https://github.com/LieberInstitute/spatialLIBD/blob/master/R/run_app.R#L86.

add graph-based clustering within samples

if this doesn't work then we try miniBatch k-means clustering https://bioconductor.org/packages/release/bioc/html/mbkmeans.html

Combine sce_list into a single sce object

For that step, you'll likely need to use code from https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/convert_sce.R#L19-L20. Note that the tibbles for the images have to be combined which involves https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L151 and https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/convert_sce.R#L20.

#5 could be done before or after the cbind() step.

You might have to manually create additional columns for the colData(sce) such as subject, position, replicate, subject_position although that is study-dependent. For example, sce$position gets created at https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/convert_sce.R#L58-L59.

Run SpaceRanger for 2720 Ant (round 4)

You can overwrite the current SpaceRanger results we have.

Try running the shiny app locally (on a laptop)

After #7, to try out the new sce object with spatialLIBD::run_app() we need to do a couple of things.

Check the file size in RAM using pryr::object_size(sce) at JHPCE
Transfer the data to your laptop: the sce object as well as the low resolution images
Install R 4.0.3 with Bioconductor 3.12 (BiocManager::install(version = "3.12") + BiocManager::valid())
Install spatialLIBD (BiocManager::install("spatialLIBD"))

Next, we can try to run the app using spatialLIBD::run_app() that has the following arguments https://github.com/LieberInstitute/spatialLIBD/blob/master/R/run_app.R#L49-L93.

We'll need to override several of them (aka, not use the defaults). In particular:

sce
image_path: should be a local directory containing the low resolution images, similar to https://github.com/LieberInstitute/spatialLIBD/tree/master/inst/app/www/data where each directory is the sample name and inside you can find the tissue_lowres_image.png files like at https://github.com/LieberInstitute/spatialLIBD/tree/master/inst/app/www/data/151507
sce_discrete_vars (we don't really have any of them right now)
sce_continuous_vars (we have most, I don't think that we have cell_count yet, though maybe Maddy has run the image segmentation Matlab code already)
spatial_libd_var which we might need to create with no data for now (as in sce$leo <- NA) since we don't have any clusters yet

Then, the app should ideally (we'll see if there's code that needs to be changed) work on the "spot-level" tab. The "layer-level" tab will still show the older data.

Compute ARI against BayesSpace

Once #31 #41 #42 are all done, compute ARIs against BayesSpace in preparation for Figure 2.

Version control VistoSeg code for this project

See the commit message at LieberInstitute/spatial_hpc@1d32b1e for details. See @madhavitippani for more info too.

Run BayesSpace on non-batch corrected dimensions in pilot data

do batch correction with Harmony

do this before we run clustering.

then run bayesspace on harmony dimensions instead of PCs

give latest histology image to heena

get it from the CS2 computer and transfer it to the cluster so heena can divide it and run vistoseg

Consider re-organizing scripts

Maybe move some of the older scripts from https://github.com/LieberInstitute/spatialDLPFC/tree/c61820105268869abff3cbfa029e97827b670073/code/analysis to old or something like that. That way the new code from #41 and #42 will be easier to identify.

Add reduced dimensions

Similar to #9, lets add to the sce object the reduced dimensions using the data from all the images in our sce object prior to #8 & #7. That involves running code Abby is familiar with already from https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L80-L230.

Note that that code uses block = sce$subject_position which will be study-specific. If we have samples from the same donors, I would use block with something like block = subject. If we have spatial replicates, then we should use the original block = sce$subject_position (which you'll need to create first). This block argument is our way of telling scran our data structure (like the fact that our spots from a given set of images).

Update code to new JHPCE location & use here::here()

This is a more general issue and can be done one script a time.

Example here usage: https://github.com/LieberInstitute/DLPFC_snRNAseq/blob/main/code/02_cellranger_metrics/01_Tran_et_al_metrics.R

make down-sampled version of SPE object

spe$quadrant <- if(coord_x < 0 & coord_y > 0) "topleft" else if(....) spe$sample_quadrant <- paste0(spe$sampleid, "_", spe$quadrant) sample_quad_list <- rafalib::splitit(spe$sample_quadrant) selected_sample_quad <- unlist(lapply(sample_squad_list, sample, n = 250)) spe_subsampled <- spe[, selected_sample_quad]

Use older columns for the tissue position info

Instead of the column names at

spatialDLPFC/analysis/1_marker_genes.R

Lines 49 to 50 in a38bef5

    
           col.names=c("barcode_id", "in_tissue", "array_row", "array_col",  
        
                       "pxl_col_in_fullres", "pxl_row_in_fullres"))

(which are the ones used in the current release of SpatialExperiment) use the older names as in https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L116 then do the same multiplications (using the scaled factors info from #1) https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L118-L119.

Eventually, we want this info in the colData(sce) slot which Abby is doing at

spatialDLPFC/analysis/1_marker_genes.R

Line 83 in a38bef5

col_data <- cbind(df_barcodes, df_tisspos_ord[, -1])

. Note that you still want to check the order as in

spatialDLPFC/analysis/1_marker_genes.R

Lines 71 to 78 in a38bef5

    
           # note: check and/or re-order rows to make sure barcode IDs match in df_barcodes and df_tisspos 
        
           dim(df_barcodes) 
        
           dim(df_tisspos) 
        
           ord <- match(df_barcodes$barcode_id, df_tisspos$barcode_id) 
        
           df_tisspos_ord <- df_tisspos[ord, ] 
        
           dim(df_tisspos_ord) 
        
           stopifnot(nrow(df_barcodes) == nrow(df_tisspos_ord)) 
        
           stopifnot(all(df_barcodes$barcode_id == df_tisspos_ord$barcode_id))

.

Plot marker genes using spatialLIBD

Hi Abby,

Can you make the plots for the marker genes? That will involve updating https://github.com/LieberInstitute/spatialDLPFC/blob/32a14108b4ccc2210f1f8f63857d51f4847f6994/analysis/02_marker_genes.R as well as including the R session info as a comment at the bottom of the script.

It'll probably be good so you can practice using the new spatialLIBD version with functions such as vis_clus_gene(). Check also the git/GitHub history for that file to check how I tried to simplify the code. I haven't tested it myself, so there might be a typo here or something missing here or there.

Thanks!

Decide whether to run Harmony batch correction on sample_id or donor

running it on sample_id will remove any variation in region. Find a way to test this and look specifically at whether there is variation in region that we need to worry about.

Copy the cell counts for all samples except the 2 problematic ones

To complement #48, I'll copy over the count results from all 28 samples (I know that #40 is still in progress too) over to /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rerun_spaceranger to their appropriate directories.

Re-run VistoSeg on the 2 problematic samples

Once SpaceRanger is done re-running for the 2 problematic samples, to be extra careful, we'll re-run VistoSeg to count the cells per spot in these 2 samples. They are:

sample 1:

spatialDLPFC/code/spaceranger/spaceranger_parameters.txt

Line 1 in 4c127da

    
           DLPFC_Br2743_ant_manual_alignment	V19B23-075	A1	/dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/raw-data/Images/round1/Liebert_Institute_OTS-20-7690_rush_anterior_1.tif	/dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/raw-data/images_raw_align_json/V19B23-075-A1_1_Br2743_ant_manual_alignment.json	/dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/raw-data/FASTQ/2_DLPFC_Br3942_ant

sample 4

spatialDLPFC/code/spaceranger/spaceranger_parameters.txt

Line 4 in 4c127da

    
           DLPFC_Br3942_ant_manual_alignment	V19B23-075	B1	/dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/raw-data/Images/round1/Liebert_Institute_OTS-20-7690_rush_anterior_2.tif	/dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/raw-data/images_raw_align_json/V19B23-075-B1_2_Br3942_ant_manual_alignment.json	/dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/raw-data/FASTQ/3_DLPFC_Br6423_ant

Sample 1 already finished running on SpaceRanger and you can find the output at /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rerun_spaceranger/DLPFC_Br2743_ant_manual_alignment.

Sample 4 is queued at JHPCE. Once it's done, the output will be at /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rerun_spaceranger/DLPFC_Br3942_ant_manual_alignment.

Thanks @heenadivecha!

scuttle quick cluster

Run internal scuttle code and extract the information that leads to the warning at https://github.com/LieberInstitute/spatialDLPFC/blob/main/analysis/01_build_SPE.R#L339-L356 so we can explore this information at https://github.com/LieberInstitute/spatialDLPFC/blob/main/analysis/01a_check_scran_quick_cluster.R as needed.

This is related to LTLA/scuttle#7 where we want to find out if we are using the correct statistics and if the workaround Aaron introduced in scuttle 1.1.15 is appropriate or not.

Create a new SPE object without the 3 samples with poor histology

Once #40 is done continue here.

Likely start a new script for this.

Change the order of how you read the samples at

spatialDLPFC/code/analysis/merge_spe.R

Lines 17 to 51 in c618201

    
           sample_id = c( 
        
             "DLPFC_Br2743_ant_manual_alignment", 
        
             "DLPFC_Br2743_mid_manual_alignment_extra_reads", 
        
             "DLPFC_Br2743_post_manual_alignment", 
        
             "DLPFC_Br3942_ant_manual_alignment", 
        
             "DLPFC_Br3942_mid_manual_alignment", 
        
             "DLPFC_Br3942_post_manual_alignment", 
        
             "DLPFC_Br6423_ant_manual_alignment_extra_reads", 
        
             "DLPFC_Br6423_mid_manual_alignment", 
        
             "DLPFC_Br6423_post_extra_reads", 
        
             "DLPFC_Br8492_ant_manual_alignment", 
        
             "DLPFC_Br8492_mid_manual_alignment_extra_reads", 
        
             "DLPFC_Br8492_post_manual_alignment", 
        
             "Round2/DLPFC_Br2720_ant_manual_alignment", 
        
             "Round2/DLPFC_Br2720_mid_manual_alignment", 
        
             "Round2/DLPFC_Br2720_post_extra_reads", 
        
             "Round2/DLPFC_Br6432_ant_manual_alignment", 
        
             "Round2/DLPFC_Br6432_mid_manual_alignment", 
        
             "Round2/DLPFC_Br6432_post_manual_alignment", 
        
             "Round3/DLPFC_Br6471_ant_manual_alignment_all", 
        
             "Round3/DLPFC_Br6471_mid_manual_alignment_all", 
        
             "Round3/DLPFC_Br6471_post_manual_alignment_all", 
        
             "Round3/DLPFC_Br6522_ant_manual_alignment_all", 
        
             "Round3/DLPFC_Br6522_mid_manual_alignment_all", 
        
             "Round3/DLPFC_Br6522_post_manual_alignment_all", 
        
             "Round3/DLPFC_Br8325_ant_manual_alignment_all", 
        
             "Round3/DLPFC_Br8325_mid_manual_alignment_all", 
        
             "Round3/DLPFC_Br8325_post_manual_alignment_all", 
        
             "Round3/DLPFC_Br8667_ant_extra_reads", 
        
             "Round3/DLPFC_Br8667_mid_manual_alignment_all", 
        
             "Round3/DLPFC_Br8667_post_manual_alignment_all", 
        
             "Round4/DLPFC_Br2720_ant_2", 
        
             "Round4/DLPFC_Br6432_ant_2", 
        
             "Round4/DLPFC_Br8325_mid_2" 
        
           ),

thinking about the grid of how you want to plot them later (Check in with @kmaynard12). To avoid code like https://github.com/LieberInstitute/Visium_IF_AD/blob/master/code/04_build_spe/initial_exploration.R#L13-L18 https://github.com/LieberInstitute/Visium_IF_AD/blob/master/code/04_build_spe/initial_exploration.R#L48-L67.

Use read10xVisiumWrapper() like at

spatialDLPFC/code/analysis/merge_spe.R

Line 82 in c618201

spe_wrapper <- read10xVisiumWrapper(

Read in the spot counts using code like from

spatialDLPFC/code/analysis/merge_spe.R

Lines 124 to 139 in c618201

    
           ## Read in cell counts and segmentation results 
        
           segmentations_list <- lapply(sample_info$sample_id, function(sampleid) { 
        
             current<-sample_info$sample_path[sample_info$sample_id==sampleid] 
        
             file <- file.path(current, "spatial", "tissue_spot_counts.csv") 
        
             if(!file.exists(file)) return(NULL) 
        
             x <- read.csv(file) 
        
             x$key <- paste0(x$barcode, "_", sampleid) 
        
             return(x) 
        
           }) 
        
           ## Merge them (once the these files are done, this could be replaced by an rbind) 
        
           segmentations <- Reduce(function(...) merge(..., all = TRUE), segmentations_list[lengths(segmentations_list) > 0]) 
        
           ## Add the information 
        
           segmentation_match <- match(spe_wrapper$key, segmentations$key) 
        
           segmentation_info <- segmentations[segmentation_match, - which(colnames(segmentations) %in% c("barcode", "tissue", "row", "col", "imagerow", "imagecol", "key")), drop=FALSE] 
        
           colData(spe_wrapper) <- cbind(colData(spe_wrapper), segmentation_info)

Run scran to have the QC info.
There's no need to remake the QC plots.
Compute PCs and UMAP.
Run just a single tSNE just to have it there.
Run HARMONY and BayesSpace.
Run graph-based clustering on the HARMONY dimensions.
Make a grid plot of the BayesSpace results.

add graph-based clustering on harmony dimensions

Add variables to explore low quality spots from scran

Before #8 and likely also before #7, we'll want to add a few things to our sce object. One of them could be a (or more than one) discrete variable https://github.com/LieberInstitute/spatialLIBD/blob/master/R/run_app.R#L58 that tells us what were spots that scran's quality control checks (designed for scRNA-seq data, not Visium data) are of low quality and should be discarded. That is, running code like https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L32-L33 then taking the outputs (which are logical vectors) and converting them to discrete variables (so factor or character).

Let's try this with the 3 columns in https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L35. So something like:

sce$scran_discard <- factor(edit_me, levels = c("TRUE", "FALSE"))
sce$scran_low_lib_size <- ...
sce$low_n_features <- ...

This might be more relevant for the LC data than the DLPFC data. But it'll be good to check.

Add info useful for the shiny app

From https://github.com/LieberInstitute/spatialLIBD/blob/002838163e6373f4d3899f74fd867c6c6fe91eab/inst/scripts/make-data_spatialLIBD.R add some useful information such as:

the chrM columns https://github.com/LieberInstitute/spatialLIBD/blob/002838163e6373f4d3899f74fd867c6c6fe91eab/inst/scripts/make-data_spatialLIBD.R#L63-L66
rowData(sce)$gene_search which combines the gene symbol and the Ensembl gene ID as in https://github.com/LieberInstitute/spatialLIBD/blob/002838163e6373f4d3899f74fd867c6c6fe91eab/inst/scripts/make-data_spatialLIBD.R#L106-L108

At this point, you could use spatialLIBD::check_sce() and see what else is missing http://research.libd.org/spatialLIBD/reference/check_sce.html.

Add more gene information

Use the GTF file we used for aligning the data with SpaceRanger to add more information about the genes. Note that the genes might not be in the same order, but you can rely on the Ensembl Gene ID for ordering the data.

That is, the file at

spatialDLPFC/analysis/1_marker_genes.R

Lines 37 to 40 in a38bef5

    
           # features 
        
           file_features <- file.path(dir_matrix, "features.tsv.gz") 
        
           df_features <- read.csv(file_features, sep = "\t", header = FALSE,  
        
                                   col.names = c("gene_id", "gene_name", "feature_type"))

is equivalent to the map object made at https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L78-L80 (with different column names, but they could be made consistent). I don't know if the genes are in the same order across all images, but it would be worth double checking.

In any case, the rest of the gene info can be obtained from the annotation GTF file as in https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L81-L87.

Add tibble with low res image info

Read in the low resolution images and create a tibble for them. That involves code like https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L89-L101. Just use the low resolution file from

spatialDLPFC/analysis/1_marker_genes.R

Line 60 in a38bef5

    
           imageFilePath <- file.path(dir_spatial, c("tissue_hires_image.png", "tissue_lores_image.png"))

(fix the file name from lo to low).

Add graph-based clustering discrete variables

After #10, we can run some graph-based clustering algorithms as in https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L452-L468 + https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L1185-L1191 and add them to the sce_discrete_vars in #8 as in https://github.com/LieberInstitute/spatialLIBD/blob/master/R/run_app.R#L63.

1 missing resequence (emailed Linda)
6423mid

perform spatially aware clustering method

use harmony dimension (instead of PCs) to run bayesspace, do on pilot dlpfc samples https://edward130603.github.io/BayesSpace/articles/joint_clustering.html. harmony seems to the best for batch correction.
try substitute to bayesspace which is called resept. https://www.biorxiv.org/content/10.1101/2021.07.08.451210v2
another alternative to bayesspace is SpaGCN https://www.biorxiv.org/content/10.1101/2020.11.30.405118v1, https://github.com/jianhuupenn/SpaGCN/blob/master/tutorial/tutorial_ez_mode.md#3-read-in-data, https://github.com/jianhuupenn/SpaGCN/blob/master/tutorial/tutorial.md

Compare batch corrected graph-based clustering to manual annotation in pilot data

contingency table and ARI method

Make a new SPE object with all 30+ samples

Use BioC 3.14 for this since SpatialExperiment won't be backwards compatible from 3.14.
Seems like this was started at https://github.com/LieberInstitute/spatialDLPFC/blob/main/code/analysis/01_build_SPE_all.R. It'll likely involve updating the paths for this script related to #17 .

$ ls -lh code/analysis/*rda
-rwxrwx--- 1 lcollado lieber_lcolladotor  28K Feb 10  2021 code/analysis/clusters_more_than_100_umis.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor  30K Feb 10  2021 code/analysis/clusters_more_than_10_umis.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor  30K Feb 10  2021 code/analysis/clusters_nonzero.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 162M Feb 15  2021 code/analysis/sce_combined.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 163M Jan 28  2021 code/analysis/sce_list.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 162M Feb 10  2021 code/analysis/sce_more_than_100_umis.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 162M Feb 10  2021 code/analysis/sce_more_than_10_umis.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 162M Feb 10  2021 code/analysis/sce_nonzero.rda

## Below we likely want to keep the metrics files, but not the spe or top.hvg ones
$ ls -lh processed-data/rdata/spe/*Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 1.7K Feb 17  2021 processed-data/rdata/spe/pilot_metrics.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 2.7K Jul 27 11:06 processed-data/rdata/spe/sample_metrics_072721.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 5.1K Jul 28 07:16 processed-data/rdata/spe/sample_metrics_all.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 2.4K Feb 17  2021 processed-data/rdata/spe/sample_metrics.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 4.5K Jul 27 12:15 processed-data/rdata/spe/shared_metrics_072721.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 1.6K Feb 17  2021 processed-data/rdata/spe/shared_metrics.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 840M Sep  1 12:34 processed-data/rdata/spe/spe_072821.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 418M Jul 28 15:05 processed-data/rdata/spe/spe_raw_072821.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 168M Feb 17  2021 processed-data/rdata/spe/spe_raw.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 343M Feb 17  2021 processed-data/rdata/spe/spe.Rdata
-rw-rw---- 1 aspangle lieber_lcolladotor 138K Sep  1 11:17 processed-data/rdata/spe/top.hvgs_all.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 114K Feb 17  2021 processed-data/rdata/spe/top.hvgs.Rdata

among other places.

Here's another way of finding these files based on https://stackoverflow.com/a/5905126/9374370:

## .rda files
$ find . -name "*.rda"
./code/analysis/clusters_more_than_10_umis.rda
./code/analysis/clusters_more_than_100_umis.rda
./code/analysis/sce_more_than_100_umis.rda
./code/analysis/sce_more_than_10_umis.rda
./code/analysis/clusters_nonzero.rda
./code/analysis/sce_combined.rda
./code/analysis/sce_nonzero.rda
./code/analysis/sce_list.rda

## .Rdata files
$ find . -name "*.Rdata"
./processed-data/rdata/inspect_scuttle_issue_7.Rdata
./processed-data/rdata/spe/spe.Rdata
./processed-data/rdata/spe/sample_metrics.Rdata
./processed-data/rdata/spe/spe_072821.Rdata
./processed-data/rdata/spe/spe_raw.Rdata
./processed-data/rdata/spe/shared_metrics_072721.Rdata
./processed-data/rdata/spe/top.hvgs.Rdata
./processed-data/rdata/spe/sample_metrics_all.Rdata
./processed-data/rdata/spe/sample_metrics_072721.Rdata
./processed-data/rdata/spe/shared_metrics.Rdata
./processed-data/rdata/spe/pilot_metrics.Rdata
./processed-data/rdata/spe/top.hvgs_all.Rdata
./processed-data/rdata/spe/spe_raw_072821.Rdata
./processed-data/rdata/g_k50.Rdata

The idea is to visualize some of the spaceranger sample metrics we have vs the ones we had in the past from https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/visium_dlpfc_pilot_sample_metrics.tsv. This will help us determine if we need to generate more data (do more sequencing).

	# spatial scale factors
	file_scale <- file.path(dir_spatial, "scalefactors_json.json")
	scalefactors <- fromJSON(file = file_scale)

	dir_spatial <- file.path(dir_outputs, sample_name, "outs", "spatial")
	file_tisspos <- file.path(dir_spatial, "tissue_positions_list.csv")
	df_tisspos <- read.csv(file_tisspos, header = FALSE,
	col.names=c("barcode_id", "in_tissue", "array_row", "array_col",
	"pxl_col_in_fullres", "pxl_row_in_fullres"))

	# note: check and/or re-order rows to make sure barcode IDs match in df_barcodes and df_tisspos
	dim(df_barcodes)
	dim(df_tisspos)
	ord <- match(df_barcodes$barcode_id, df_tisspos$barcode_id)
	df_tisspos_ord <- df_tisspos[ord, ]
	dim(df_tisspos_ord)
	stopifnot(nrow(df_barcodes) == nrow(df_tisspos_ord))
	stopifnot(all(df_barcodes$barcode_id == df_tisspos_ord$barcode_id))

	sample_id = c(
	"DLPFC_Br2743_ant_manual_alignment",
	"DLPFC_Br2743_mid_manual_alignment_extra_reads",
	"DLPFC_Br2743_post_manual_alignment",
	"DLPFC_Br3942_ant_manual_alignment",
	"DLPFC_Br3942_mid_manual_alignment",
	"DLPFC_Br3942_post_manual_alignment",
	"DLPFC_Br6423_ant_manual_alignment_extra_reads",
	"DLPFC_Br6423_mid_manual_alignment",
	"DLPFC_Br6423_post_extra_reads",
	"DLPFC_Br8492_ant_manual_alignment",
	"DLPFC_Br8492_mid_manual_alignment_extra_reads",
	"DLPFC_Br8492_post_manual_alignment",
	"Round2/DLPFC_Br2720_ant_manual_alignment",
	"Round2/DLPFC_Br2720_mid_manual_alignment",
	"Round2/DLPFC_Br2720_post_extra_reads",
	"Round2/DLPFC_Br6432_ant_manual_alignment",
	"Round2/DLPFC_Br6432_mid_manual_alignment",
	"Round2/DLPFC_Br6432_post_manual_alignment",
	"Round3/DLPFC_Br6471_ant_manual_alignment_all",
	"Round3/DLPFC_Br6471_mid_manual_alignment_all",
	"Round3/DLPFC_Br6471_post_manual_alignment_all",
	"Round3/DLPFC_Br6522_ant_manual_alignment_all",
	"Round3/DLPFC_Br6522_mid_manual_alignment_all",
	"Round3/DLPFC_Br6522_post_manual_alignment_all",
	"Round3/DLPFC_Br8325_ant_manual_alignment_all",
	"Round3/DLPFC_Br8325_mid_manual_alignment_all",
	"Round3/DLPFC_Br8325_post_manual_alignment_all",
	"Round3/DLPFC_Br8667_ant_extra_reads",
	"Round3/DLPFC_Br8667_mid_manual_alignment_all",
	"Round3/DLPFC_Br8667_post_manual_alignment_all",
	"Round4/DLPFC_Br2720_ant_2",
	"Round4/DLPFC_Br6432_ant_2",
	"Round4/DLPFC_Br8325_mid_2"
	),

	## Read in cell counts and segmentation results
	segmentations_list <- lapply(sample_info$sample_id, function(sampleid) {
	current<-sample_info$sample_path[sample_info$sample_id==sampleid]
	file <- file.path(current, "spatial", "tissue_spot_counts.csv")
	if(!file.exists(file)) return(NULL)
	x <- read.csv(file)
	x$key <- paste0(x$barcode, "_", sampleid)
	return(x)
	})
	## Merge them (once the these files are done, this could be replaced by an rbind)
	segmentations <- Reduce(function(...) merge(..., all = TRUE), segmentations_list[lengths(segmentations_list) > 0])

	## Add the information
	segmentation_match <- match(spe_wrapper$key, segmentations$key)
	segmentation_info <- segmentations[segmentation_match, - which(colnames(segmentations) %in% c("barcode", "tissue", "row", "col", "imagerow", "imagecol", "key")), drop=FALSE]
	colData(spe_wrapper) <- cbind(colData(spe_wrapper), segmentation_info)

	# features
	file_features <- file.path(dir_matrix, "features.tsv.gz")
	df_features <- read.csv(file_features, sep = "\t", header = FALSE,
	col.names = c("gene_id", "gene_name", "feature_type"))

lieberinstitute / spatialdlpfc Goto Github PK

spatialdlpfc's Issues

Recommend Projects

Recommend Topics

Recommend Org