Coder Social home page Coder Social logo

lieberinstitute / spatialdlpfc Goto Github PK

View Code? Open in Web Editor NEW
14.0 9.0 2.0 15.23 GB

spatialDLPFC project involving Visium (n = 30), Visium SPG (n = 4) and snRNA-seq (n = 19) samples

Home Page: http://research.libd.org/spatialDLPFC/

R 0.31% Shell 0.05% HTML 98.66% MATLAB 0.01% Python 0.09% Jupyter Notebook 0.88%
rstats visium dlpfc transcriptomics bioconductor spatialdlpfc visium-spg spatial-transcriptomics

spatialdlpfc's Issues

Update scaleFactors code

Read in the scale factors at

# spatial scale factors
file_scale <- file.path(dir_spatial, "scalefactors_json.json")
scalefactors <- fromJSON(file = file_scale)
before you read the tissue position info at
dir_spatial <- file.path(dir_outputs, sample_name, "outs", "spatial")
file_tisspos <- file.path(dir_spatial, "tissue_positions_list.csv")
df_tisspos <- read.csv(file_tisspos, header = FALSE,
col.names=c("barcode_id", "in_tissue", "array_row", "array_col",
"pxl_col_in_fullres", "pxl_row_in_fullres"))
since we need it for doing the multiplications at https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L118-L119.

It would be similar to https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L105

Re-organize files to match new project structure

Overall file structure ideas https://lcolladotor.github.io/bioc_team_ds/organizing-your-work.html

Live examples: https://github.com/LieberInstitute/Visium_IF_AD and the newer https://github.com/LieberInstitute/DLPFC_snRNAseq

See more on Slack at https://jhu-genomics.slack.com/archives/C01EA7VDJNT/p1630516674026100

Current status:

 $ ls -lh /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC
total 234K
drwxrws---  2 lcollado lieber_lcolladotor   31 Jul 28 10:47 analysis
drwxrws--- 14 lcollado lieber_lcolladotor   14 Nov 25  2020 FASTQ
drwxrws---  2 lcollado lieber_lcolladotor    2 Nov 18  2020 html
drwxrws---  8 lcollado lieber_lcolladotor    9 Jul 14 13:25 Images
drwxrws---  4 lcollado lieber_lcolladotor   16 Jul 23 16:53 images_raw_align_json
drwxrws---  4 lcollado lieber_lcolladotor    4 Feb 17  2021 outputs
drwxrws---  6 lcollado lieber_lcolladotor   35 Aug  3 09:08 plots
drwxrws---  3 lcollado lieber_lcolladotor    5 Mar 22 12:33 rdata
-rw-rw----  1 lcollado lieber_lcolladotor 1.8K Nov 18  2020 README.md
drwxrws---  2 lcollado lieber_lcolladotor    5 Jul 26 12:27 sample_info
drwxrws---  3 lcollado lieber_lcolladotor    3 Jul 26 11:01 scripts
-rw-rw----  1 lcollado lieber_lcolladotor  270 Feb 15  2021 spatialDLPFC.Rproj

More specifically:

Combine sce_list into a single sce object

For that step, you'll likely need to use code from https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/convert_sce.R#L19-L20. Note that the tibbles for the images have to be combined which involves https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L151 and https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/convert_sce.R#L20.

#5 could be done before or after the cbind() step.

You might have to manually create additional columns for the colData(sce) such as subject, position, replicate, subject_position although that is study-dependent. For example, sce$position gets created at https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/convert_sce.R#L58-L59.

Try running the shiny app locally (on a laptop)

After #7, to try out the new sce object with spatialLIBD::run_app() we need to do a couple of things.

  1. Check the file size in RAM using pryr::object_size(sce) at JHPCE
  2. Transfer the data to your laptop: the sce object as well as the low resolution images
  3. Install R 4.0.3 with Bioconductor 3.12 (BiocManager::install(version = "3.12") + BiocManager::valid())
  4. Install spatialLIBD (BiocManager::install("spatialLIBD"))

Next, we can try to run the app using spatialLIBD::run_app() that has the following arguments https://github.com/LieberInstitute/spatialLIBD/blob/master/R/run_app.R#L49-L93.

We'll need to override several of them (aka, not use the defaults). In particular:

Then, the app should ideally (we'll see if there's code that needs to be changed) work on the "spot-level" tab. The "layer-level" tab will still show the older data.

Add reduced dimensions

Similar to #9, lets add to the sce object the reduced dimensions using the data from all the images in our sce object prior to #8 & #7. That involves running code Abby is familiar with already from https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L80-L230.

Note that that code uses block = sce$subject_position which will be study-specific. If we have samples from the same donors, I would use block with something like block = subject. If we have spatial replicates, then we should use the original block = sce$subject_position (which you'll need to create first). This block argument is our way of telling scran our data structure (like the fact that our spots from a given set of images).

make down-sampled version of SPE object

spe$quadrant <- if(coord_x < 0 & coord_y > 0) "topleft" else if(....) spe$sample_quadrant <- paste0(spe$sampleid, "_", spe$quadrant) sample_quad_list <- rafalib::splitit(spe$sample_quadrant) selected_sample_quad <- unlist(lapply(sample_squad_list, sample, n = 250)) spe_subsampled <- spe[, selected_sample_quad]

Use older columns for the tissue position info

Instead of the column names at

col.names=c("barcode_id", "in_tissue", "array_row", "array_col",
"pxl_col_in_fullres", "pxl_row_in_fullres"))
(which are the ones used in the current release of SpatialExperiment) use the older names as in https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L116 then do the same multiplications (using the scaled factors info from #1) https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L118-L119.

Eventually, we want this info in the colData(sce) slot which Abby is doing at

col_data <- cbind(df_barcodes, df_tisspos_ord[, -1])
. Note that you still want to check the order as in
# note: check and/or re-order rows to make sure barcode IDs match in df_barcodes and df_tisspos
dim(df_barcodes)
dim(df_tisspos)
ord <- match(df_barcodes$barcode_id, df_tisspos$barcode_id)
df_tisspos_ord <- df_tisspos[ord, ]
dim(df_tisspos_ord)
stopifnot(nrow(df_barcodes) == nrow(df_tisspos_ord))
stopifnot(all(df_barcodes$barcode_id == df_tisspos_ord$barcode_id))
.

Plot marker genes using spatialLIBD

Hi Abby,

Can you make the plots for the marker genes? That will involve updating https://github.com/LieberInstitute/spatialDLPFC/blob/32a14108b4ccc2210f1f8f63857d51f4847f6994/analysis/02_marker_genes.R as well as including the R session info as a comment at the bottom of the script.

It'll probably be good so you can practice using the new spatialLIBD version with functions such as vis_clus_gene(). Check also the git/GitHub history for that file to check how I tried to simplify the code. I haven't tested it myself, so there might be a typo here or something missing here or there.

Thanks!

Re-run VistoSeg on the 2 problematic samples

Once SpaceRanger is done re-running for the 2 problematic samples, to be extra careful, we'll re-run VistoSeg to count the cells per spot in these 2 samples. They are:

  • sample 1:
    DLPFC_Br2743_ant_manual_alignment V19B23-075 A1 /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/raw-data/Images/round1/Liebert_Institute_OTS-20-7690_rush_anterior_1.tif /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/raw-data/images_raw_align_json/V19B23-075-A1_1_Br2743_ant_manual_alignment.json /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/raw-data/FASTQ/2_DLPFC_Br3942_ant
  • sample 4
    DLPFC_Br3942_ant_manual_alignment V19B23-075 B1 /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/raw-data/Images/round1/Liebert_Institute_OTS-20-7690_rush_anterior_2.tif /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/raw-data/images_raw_align_json/V19B23-075-B1_2_Br3942_ant_manual_alignment.json /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/raw-data/FASTQ/3_DLPFC_Br6423_ant

Sample 1 already finished running on SpaceRanger and you can find the output at /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rerun_spaceranger/DLPFC_Br2743_ant_manual_alignment.

Sample 4 is queued at JHPCE. Once it's done, the output will be at /dcs04/lieber/lcolladotor/spatialDLPFC_LIBD4035/spatialDLPFC/processed-data/rerun_spaceranger/DLPFC_Br3942_ant_manual_alignment.

Thanks @heenadivecha!

scuttle quick cluster

Run internal scuttle code and extract the information that leads to the warning at https://github.com/LieberInstitute/spatialDLPFC/blob/main/analysis/01_build_SPE.R#L339-L356 so we can explore this information at https://github.com/LieberInstitute/spatialDLPFC/blob/main/analysis/01a_check_scran_quick_cluster.R as needed.

This is related to LTLA/scuttle#7 where we want to find out if we are using the correct statistics and if the workaround Aaron introduced in scuttle 1.1.15 is appropriate or not.

Create a new SPE object without the 3 samples with poor histology

Once #40 is done continue here.

Likely start a new script for this.

  • Change the order of how you read the samples at
    sample_id = c(
    "DLPFC_Br2743_ant_manual_alignment",
    "DLPFC_Br2743_mid_manual_alignment_extra_reads",
    "DLPFC_Br2743_post_manual_alignment",
    "DLPFC_Br3942_ant_manual_alignment",
    "DLPFC_Br3942_mid_manual_alignment",
    "DLPFC_Br3942_post_manual_alignment",
    "DLPFC_Br6423_ant_manual_alignment_extra_reads",
    "DLPFC_Br6423_mid_manual_alignment",
    "DLPFC_Br6423_post_extra_reads",
    "DLPFC_Br8492_ant_manual_alignment",
    "DLPFC_Br8492_mid_manual_alignment_extra_reads",
    "DLPFC_Br8492_post_manual_alignment",
    "Round2/DLPFC_Br2720_ant_manual_alignment",
    "Round2/DLPFC_Br2720_mid_manual_alignment",
    "Round2/DLPFC_Br2720_post_extra_reads",
    "Round2/DLPFC_Br6432_ant_manual_alignment",
    "Round2/DLPFC_Br6432_mid_manual_alignment",
    "Round2/DLPFC_Br6432_post_manual_alignment",
    "Round3/DLPFC_Br6471_ant_manual_alignment_all",
    "Round3/DLPFC_Br6471_mid_manual_alignment_all",
    "Round3/DLPFC_Br6471_post_manual_alignment_all",
    "Round3/DLPFC_Br6522_ant_manual_alignment_all",
    "Round3/DLPFC_Br6522_mid_manual_alignment_all",
    "Round3/DLPFC_Br6522_post_manual_alignment_all",
    "Round3/DLPFC_Br8325_ant_manual_alignment_all",
    "Round3/DLPFC_Br8325_mid_manual_alignment_all",
    "Round3/DLPFC_Br8325_post_manual_alignment_all",
    "Round3/DLPFC_Br8667_ant_extra_reads",
    "Round3/DLPFC_Br8667_mid_manual_alignment_all",
    "Round3/DLPFC_Br8667_post_manual_alignment_all",
    "Round4/DLPFC_Br2720_ant_2",
    "Round4/DLPFC_Br6432_ant_2",
    "Round4/DLPFC_Br8325_mid_2"
    ),
    thinking about the grid of how you want to plot them later (Check in with @kmaynard12). To avoid code like https://github.com/LieberInstitute/Visium_IF_AD/blob/master/code/04_build_spe/initial_exploration.R#L13-L18 https://github.com/LieberInstitute/Visium_IF_AD/blob/master/code/04_build_spe/initial_exploration.R#L48-L67.
  • Use read10xVisiumWrapper() like at
    spe_wrapper <- read10xVisiumWrapper(
  • Read in the spot counts using code like from
    ## Read in cell counts and segmentation results
    segmentations_list <- lapply(sample_info$sample_id, function(sampleid) {
    current<-sample_info$sample_path[sample_info$sample_id==sampleid]
    file <- file.path(current, "spatial", "tissue_spot_counts.csv")
    if(!file.exists(file)) return(NULL)
    x <- read.csv(file)
    x$key <- paste0(x$barcode, "_", sampleid)
    return(x)
    })
    ## Merge them (once the these files are done, this could be replaced by an rbind)
    segmentations <- Reduce(function(...) merge(..., all = TRUE), segmentations_list[lengths(segmentations_list) > 0])
    ## Add the information
    segmentation_match <- match(spe_wrapper$key, segmentations$key)
    segmentation_info <- segmentations[segmentation_match, - which(colnames(segmentations) %in% c("barcode", "tissue", "row", "col", "imagerow", "imagecol", "key")), drop=FALSE]
    colData(spe_wrapper) <- cbind(colData(spe_wrapper), segmentation_info)
  • Run scran to have the QC info.
  • There's no need to remake the QC plots.
  • Compute PCs and UMAP.
  • Run just a single tSNE just to have it there.
  • Run HARMONY and BayesSpace.
  • Run graph-based clustering on the HARMONY dimensions.
  • Make a grid plot of the BayesSpace results.

Add variables to explore low quality spots from scran

Before #8 and likely also before #7, we'll want to add a few things to our sce object. One of them could be a (or more than one) discrete variable https://github.com/LieberInstitute/spatialLIBD/blob/master/R/run_app.R#L58 that tells us what were spots that scran's quality control checks (designed for scRNA-seq data, not Visium data) are of low quality and should be discarded. That is, running code like https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L32-L33 then taking the outputs (which are logical vectors) and converting them to discrete variables (so factor or character).

Let's try this with the 3 columns in https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/sce_scran.R#L35. So something like:

sce$scran_discard <- factor(edit_me, levels = c("TRUE", "FALSE"))
sce$scran_low_lib_size <- ...
sce$low_n_features <- ...

This might be more relevant for the LC data than the DLPFC data. But it'll be good to check.

Add info useful for the shiny app

Add more gene information

Use the GTF file we used for aligning the data with SpaceRanger to add more information about the genes. Note that the genes might not be in the same order, but you can rely on the Ensembl Gene ID for ordering the data.

That is, the file at

# features
file_features <- file.path(dir_matrix, "features.tsv.gz")
df_features <- read.csv(file_features, sep = "\t", header = FALSE,
col.names = c("gene_id", "gene_name", "feature_type"))
is equivalent to the map object made at https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L78-L80 (with different column names, but they could be made consistent). I don't know if the genes are in the same order across all images, but it would be worth double checking.

In any case, the rest of the gene info can be obtained from the annotation GTF file as in https://github.com/LieberInstitute/HumanPilot/blob/7e0f25188969913bc247b5d6c1c47c3ad94c6f15/Analysis/Layer_Notebook.R#L81-L87.

Compare new methods using our published data from 2021

Using the published data from 2021, compare the ARI from SpaGCN and BayesSpace (joint clustering). This will likely be a supplementary figure in this paper.

We should also add this info to our published data and its companion apps.

Run SpaGCN

Once #41 is done, we'll want to run SpaGCN on the 30 samples we'll use for this project.

run space ranger on round4 data

3 "new" samples
6432_ant_2
8325_mid_2
2720_ant_2

2 resequences
2720post
8667ant

1 missing resequence (emailed Linda)
6423mid

perform spatially aware clustering method

use harmony dimension (instead of PCs) to run bayesspace, do on pilot dlpfc samples https://edward130603.github.io/BayesSpace/articles/joint_clustering.html. harmony seems to the best for batch correction.
try substitute to bayesspace which is called resept. https://www.biorxiv.org/content/10.1101/2021.07.08.451210v2
another alternative to bayesspace is SpaGCN https://www.biorxiv.org/content/10.1101/2020.11.30.405118v1, https://github.com/jianhuupenn/SpaGCN/blob/master/tutorial/tutorial_ez_mode.md#3-read-in-data, https://github.com/jianhuupenn/SpaGCN/blob/master/tutorial/tutorial.md

Move to trash any older rda files we don't need anymore

Move to trash/ any older SPE objects that we likely don't need anymore. The goal is to minimize confusion. You might want to do this before starting with #19.

That likely involves some of the spe objects listed at

$ ls -lh code/analysis/*rda
-rwxrwx--- 1 lcollado lieber_lcolladotor  28K Feb 10  2021 code/analysis/clusters_more_than_100_umis.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor  30K Feb 10  2021 code/analysis/clusters_more_than_10_umis.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor  30K Feb 10  2021 code/analysis/clusters_nonzero.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 162M Feb 15  2021 code/analysis/sce_combined.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 163M Jan 28  2021 code/analysis/sce_list.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 162M Feb 10  2021 code/analysis/sce_more_than_100_umis.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 162M Feb 10  2021 code/analysis/sce_more_than_10_umis.rda
-rwxrwx--- 1 lcollado lieber_lcolladotor 162M Feb 10  2021 code/analysis/sce_nonzero.rda

## Below we likely want to keep the metrics files, but not the spe or top.hvg ones
$ ls -lh processed-data/rdata/spe/*Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 1.7K Feb 17  2021 processed-data/rdata/spe/pilot_metrics.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 2.7K Jul 27 11:06 processed-data/rdata/spe/sample_metrics_072721.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 5.1K Jul 28 07:16 processed-data/rdata/spe/sample_metrics_all.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 2.4K Feb 17  2021 processed-data/rdata/spe/sample_metrics.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 4.5K Jul 27 12:15 processed-data/rdata/spe/shared_metrics_072721.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 1.6K Feb 17  2021 processed-data/rdata/spe/shared_metrics.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 840M Sep  1 12:34 processed-data/rdata/spe/spe_072821.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 418M Jul 28 15:05 processed-data/rdata/spe/spe_raw_072821.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 168M Feb 17  2021 processed-data/rdata/spe/spe_raw.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 343M Feb 17  2021 processed-data/rdata/spe/spe.Rdata
-rw-rw---- 1 aspangle lieber_lcolladotor 138K Sep  1 11:17 processed-data/rdata/spe/top.hvgs_all.Rdata
-rw-rw---- 1 lcollado lieber_lcolladotor 114K Feb 17  2021 processed-data/rdata/spe/top.hvgs.Rdata

among other places.

Here's another way of finding these files based on https://stackoverflow.com/a/5905126/9374370:

## .rda files
$ find . -name "*.rda"
./code/analysis/clusters_more_than_10_umis.rda
./code/analysis/clusters_more_than_100_umis.rda
./code/analysis/sce_more_than_100_umis.rda
./code/analysis/sce_more_than_10_umis.rda
./code/analysis/clusters_nonzero.rda
./code/analysis/sce_combined.rda
./code/analysis/sce_nonzero.rda
./code/analysis/sce_list.rda

## .Rdata files
$ find . -name "*.Rdata"
./processed-data/rdata/inspect_scuttle_issue_7.Rdata
./processed-data/rdata/spe/spe.Rdata
./processed-data/rdata/spe/sample_metrics.Rdata
./processed-data/rdata/spe/spe_072821.Rdata
./processed-data/rdata/spe/spe_raw.Rdata
./processed-data/rdata/spe/shared_metrics_072721.Rdata
./processed-data/rdata/spe/top.hvgs.Rdata
./processed-data/rdata/spe/sample_metrics_all.Rdata
./processed-data/rdata/spe/sample_metrics_072721.Rdata
./processed-data/rdata/spe/shared_metrics.Rdata
./processed-data/rdata/spe/pilot_metrics.Rdata
./processed-data/rdata/spe/top.hvgs_all.Rdata
./processed-data/rdata/spe/spe_raw_072821.Rdata
./processed-data/rdata/g_k50.Rdata

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.