FlexiDeconv vignette

FlexiDeconv

FlexiDeconv is a flexible cell type deconvolution method for the Spatial Transcriptomics. It infers cell type proportions across pixels by flexibly leveraging the reference, which consists of cell type gene expression profiles typically derived from scRNA-seq data. Nevertheless, user can choose how much the reference should be trusted in the inference process, through the prior weight parameter \(w_k\) for every cell type \(k=1,2,...\).

This vignette is aimed at walking through the main functionality of FlexiDeconv relating to inferring the cell type proportion of pixels, and making visualizations.

We acknowledge that some visualizations are borrowed from STdeconvolve (Miller et al. 2022).

Loading in the package

set.seed(0)
library(FlexiDeconv)

Loading in the data

The dataset that will be used in the vignette is borrowed from (Moffitt et al. 2018), which has a total of 30 mice, with both female and male. We selected the second mouse because it has the largest number of slices available. To simulate low data signal scenario, we subsampled only two slices, with \(slice = -0.09\) and \(slice = 0.21\). Furthermore, we filtered 135 genes following STdeconvolve’s simulation setup (Miller et al. 2022), in order to be consistent.

Originally, this dataset is an imaging-based MERFISH data, with cell segmentation already performed by the author. Hence, we aggregated the cells to simulate Spatial Transcriptomics data, where each pixel has size 100\(\mu m\) x 100 \(\mu m\). Furthermore, we also aggregated all cells within the selected female mouse to generate a gene expression profile, as visualized below. We treat it as the ground truth gene expression profile.

data(mouse_hypothalamus)
p <- visualizeReference(mouse_hypothalamus$ground_truth_geneexp)
p

To simulate imperfect reference, we removed a cell type (OD Mature) from the reference. For FlexiDeconv to recover the missing cell type, we appended a single placeholder cell type into the new reference, generated from dirichlet distribution with uniform mean.

This visualization shows the gene expression profile of the reference. Every row represents an unique cell type.

new_reference <- appendPlaceholder(mouse_hypothalamus$reference, numPlaceholder=1)
p <- visualizeReference(new_reference)
p

Since the Spatial Transcriptomics data is simulated from MERFISH data, the ground truth cell type proportion per pixel is easily attainable. Before plotting it, we define the color we want for each cell type, to make each cell type as distinct as possible.

color = c('Astrocyte'='red', 'Endothelial'='orange', 
             'Ependymal'='black', 'Excitatory'='blue', 
             'Inhibitory'='green', 'Microglia'='purple', 
             'OD Immature'='yellow', 'OD Mature'='brown',
             'Pericytes'='cyan')

# retrieving the legend object that will be part of the visualization later
legend_grob <- getLegendGrob(color)

For each cell type within each pixel, the ground truth proportion is defined as: \[ \frac{\text{number of genes originated from this cell type}}{\text{total number of genes}} \] This visualization is borrowed from STdeconvolve, as acknowledged at the beginning of the markdown file. Every circle represents a pixel, the proportion of cell types is denoted as a pie chart. Cell type - color correspondence is shown on the right.

plotDeconvRes(mouse_hypothalamus$ground_truth_deconv,
              mouse_hypothalamus$spatial_meta,
              color = color,
              numRow = 1,
              numCol = 2,
              legend_grob = legend_grob,
              title = "Ground truth cell type proportion per pixel")
## Plotting scatterpies for 321 pixels with 9 cell-types...this could take a while if the dataset is large.
## Plotting scatterpies for 322 pixels with 9 cell-types...this could take a while if the dataset is large.

Since the goal is to try to recover OD Mature, we first visualize locations of OD Mature cells. From the plot below, we can see that OD Mature occupies distinct regions in the two slices.

This visualization tool can be used to visualize location of a particular cell type.

cellType = 'OD Mature'
cellType_idx = which(colnames(mouse_hypothalamus$ground_truth_deconv) == cellType)
plotDeconvRes(mouse_hypothalamus$ground_truth_deconv,
              mouse_hypothalamus$spatial_meta,
              color = color,
              numRow = 1,
              numCol = 2,
              topic = cellType_idx,
              r=4)

To run FlexiDeconv, we use the following setting: 1. Prior weight = 5 for known cell types 2. Prior weight = 0.1 for the placeholder This is because we trust the gene expression profiles of other cell types to be well matched with the ground truth, while giving the placeholder enough freedom for it to explore the parameter space.

Note that prior weight = 5 has a rough interpretation of, relative to the spatial data, reference contributes 5 times as much to the estimated gene expression profile.

Feel free to explore other options of the prior weight and different number of placeholders.

prior_const = c(rep(5, 8), 0.1)
output = runFlexiDeconv(as.matrix(mouse_hypothalamus$spatial),
                         new_reference, 
                         prior_const)
## Length of vector alpha: 9
## Parameter gamma: 643 pixels x 9 cell types
## Parameter phi: 643 pixels x 135 genes x 9 cell types
## Parameter tau: 9 cell types x 135 genes
## Iteration: 50. Curr ELBO: -11079002.7989347
## Iteration: 100. Curr ELBO: -11077973.9963133
## Iteration: 150. Curr ELBO: -11077772.2159909
## Iteration: 200. Curr ELBO: -11077684.6793728
## Iteration: 250. Curr ELBO: -11077652.957574
## Iteration: 300. Curr ELBO: -11077632.4261045
## Iteration: 350. Curr ELBO: -11077611.8129174
## Iteration: 400. Curr ELBO: -11077600.0044498
## Iteration: 450. Curr ELBO: -11077586.4345879
## Iteration: 500. Curr ELBO: -11077581.9003716
## Iteration: 550. Curr ELBO: -11077581.7968095
## Iteration: 600. Curr ELBO: -11077581.4487384
## Iteration: 650. Curr ELBO: -11077579.3837671
## Iteration: 700. Curr ELBO: -11077579.0944565
## Iteration: 750. Curr ELBO: -11077574.5650738
## Iteration: 800. Curr ELBO: -11077570.707376
## Iteration: 850. Curr ELBO: -11077570.6933272
## Iteration: 900. Curr ELBO: -11077570.6746931
## Iteration: 950. Curr ELBO: -11077570.6453017
## Iteration: 1000. Curr ELBO: -11077570.580634
## Iteration: 1050. Curr ELBO: -11077569.0189833
## Iteration: 1100. Curr ELBO: -11077568.7365644
## Iteration: 1150. Curr ELBO: -11077564.4839048
## Iteration: 1200. Curr ELBO: -11077564.4830457
## Iteration: 1250. Curr ELBO: -11077564.4826654
## Iteration: 1300. Curr ELBO: -11077564.4824076
## Iteration: 1350. Curr ELBO: -11077564.4821183
## Iteration: 1400. Curr ELBO: -11077564.4803819
## Iteration: 1450. Curr ELBO: -11077564.4010301
## Iteration: 1500. Curr ELBO: -11077562.7327071
## Iteration: 1550. Curr ELBO: -11077562.7322096
## Iteration: 1600. Curr ELBO: -11077562.7318017
## Iteration: 1650. Curr ELBO: -11077562.7314267
## Iteration: 1700. Curr ELBO: -11077562.7310698
## Iteration: 1750. Curr ELBO: -11077562.730725
## Iteration: 1800. Curr ELBO: -11077562.7303889
## Iteration: 1850. Curr ELBO: -11077562.7300594
## Iteration: 1900. Curr ELBO: -11077562.7297351
## Iteration: 1950. Curr ELBO: -11077562.7294145
## Iteration: 2000. Curr ELBO: -11077562.7290966
## Iteration: 2050. Curr ELBO: -11077562.7287804
## Iteration: 2100. Curr ELBO: -11077562.7284648
## Iteration: 2150. Curr ELBO: -11077562.7281489
## Iteration: 2200. Curr ELBO: -11077562.7278316
## Iteration: 2250. Curr ELBO: -11077562.7275121
## Iteration: 2300. Curr ELBO: -11077562.7271891
## Iteration: 2350. Curr ELBO: -11077562.7268617
## Iteration: 2400. Curr ELBO: -11077562.7265288
## Iteration: 2450. Curr ELBO: -11077562.726189
## Iteration: 2500. Curr ELBO: -11077562.7258411
## Iteration: 2550. Curr ELBO: -11077562.7254837
## Iteration: 2600. Curr ELBO: -11077562.7251152
## Iteration: 2650. Curr ELBO: -11077562.7247337
## Iteration: 2700. Curr ELBO: -11077562.7243375
## Iteration: 2750. Curr ELBO: -11077562.7239241
## Iteration: 2800. Curr ELBO: -11077562.7234912
## Iteration: 2850. Curr ELBO: -11077562.7230357
## Iteration: 2900. Curr ELBO: -11077562.7225543
## Iteration: 2950. Curr ELBO: -11077562.722043
## Iteration: 3000. Curr ELBO: -11077562.7214971
## Iteration: 3050. Curr ELBO: -11077562.7209109
## Iteration: 3100. Curr ELBO: -11077562.7202776
## Iteration: 3150. Curr ELBO: -11077562.7195891
## Iteration: 3200. Curr ELBO: -11077562.718835
## Iteration: 3250. Curr ELBO: -11077562.7180025
## Iteration: 3300. Curr ELBO: -11077562.7170754
## Iteration: 3350. Curr ELBO: -11077562.7160327
## Iteration: 3400. Curr ELBO: -11077562.7148468
## Iteration: 3450. Curr ELBO: -11077562.7134805
## Iteration: 3500. Curr ELBO: -11077562.7118827
## Iteration: 3550. Curr ELBO: -11077562.7099809
## Iteration: 3600. Curr ELBO: -11077562.7076684
## Iteration: 3650. Curr ELBO: -11077562.7047815
## Iteration: 3700. Curr ELBO: -11077562.7010555
## Iteration: 3750. Curr ELBO: -11077562.6960318
## Iteration: 3800. Curr ELBO: -11077562.6888408
## Iteration: 3850. Curr ELBO: -11077562.6776163
## Iteration: 3900. Curr ELBO: -11077562.6575467
## Iteration: 3950. Curr ELBO: -11077562.611835
## Iteration: 4000. Curr ELBO: -11077562.4263846
## Iteration: 4050. Curr ELBO: -11077560.6484694
## Iteration: 4100. Curr ELBO: -11077560.6483608
## Iteration: 4150. Curr ELBO: -11077560.6483278

With the new result, we need to redefine the color and retrieve new legend. The output of the FlexiDeconv contains the following: 1. output$gamma - unscaled cell type proportions per pixel with dimension pixel x cell type, can be normalized by output$gamma/rowSums(output$gamma) 2. output$tau - unscaled gene expression profile per cell type, with dimension cell type x gene, can be normalized by output$tau/rowSums(output$tau)

There are some other information contained within the output object if interested: 3. ELBO - ELBO for every 50 (default) iterations of Variational Inference 4. Alpha - estimated alpha parameter 5. phi - estimated phi parameter 6. total_iter - total number of iterations

color = c('Astrocyte'='red', 'Endothelial'='orange', 
             'Ependymal'='black', 'Excitatory'='blue', 
             'Inhibitory'='green', 'Microglia'='purple', 
             'OD Immature'='yellow', 'Placeholder 1'='brown',
             'Pericytes'='cyan')

legend_grob2 <- getLegendGrob(color)

Plot the result, we have the estimated placeholder set to color brown, hoping it will recover the ground truth proportions of OD Mature (which was previously colored as brown).

plotDeconvRes(output$gamma/rowSums(output$gamma),
              mouse_hypothalamus$spatial_meta,
              color = color,
              numRow = 1,
              numCol = 2,
              legend_grob = legend_grob2)
## Plotting scatterpies for 321 pixels with 9 cell-types...this could take a while if the dataset is large.
## Plotting scatterpies for 322 pixels with 9 cell-types...this could take a while if the dataset is large.

By comparing the estimated gene expression profile of Placeholder 1 to the ground truth OD Mature, we confirm that Placeholder 1 is indeed estimating OD Mature.

estimatedPlaceholder = output$tau["Placeholder 1",]/sum(output$tau["Placeholder 1",])
visualizeReference(rbind(estimatedPlaceholder, 
                         mouse_hypothalamus$ground_truth_geneexp["OD Mature",]))

Miller, B. F., F. Huang, L. Atta, et al. 2022. “Reference-Free Cell Type Deconvolution of Multi-Cellular Pixel-Resolution Spatially Resolved Transcriptomics Data.” Nature Communications 13: 2339. https://doi.org/10.1038/s41467-022-30033-z.
Moffitt, Jeffrey R., Dhananjay Bambah-Mukku, Stephen W. Eichhorn, et al. 2018. “Data from: Molecular, Spatial and Functional Single-Cell Profiling of the Hypothalamic Preoptic Region.” Dryad. https://doi.org/10.5061/dryad.8t8s248.