The spatialHeatmap Shiny App is the interactive implementation of most functionalities in the spatialHeatmap software, which is specialized in visualizing spatial bulk and single cell assays in anatomical images. This user manual introduces the most important features and most basic operations of this App.
This tab is designed for selecting pre-configured datasets or uploading custom datasets.
Quick start: Select a pre-configured dataset (6) and click “Spatial Heatmap” (then select genes in the table to see spatial heatmaps).
Figure 1.1-3: Go to Figure 1.1-3 (top red line: current selected tab) to select/upload a dataset, and click “Spatial Heatmap” to see spatial heatmaps.
Figure 1.4-5: Upload custom datasets. Details of each portal is seen at the respective tooltips. To format custom bulk data, please refer to instructions available here. For formatting both bulk and single-cell data, instructions are provided here. To format anatomical images, guidelines are provided here.
Figure 1.6: Instead of uploading custom datasets, select pre-configured datasets.
Figure 1: Page for selecting datasets
The spatial heatmap functionality is designed for coloring color spatial features (e.g. tissues) annotated in SVG images (aSVGs) based on the quantitative abundance levels of biomolecules (e.g. mRNAs) using a color key. The resulting plot is called a spatial heatmap (SHM). This tab includes different output forms of SHMs.
This tab display SHMs in form of static images.
Quick start: Select genes (10) in the table and click “Plot (11)”.
Figure 2.1-2: Go to tabs displaying SHMs in form of static images.
Customize SHMs use settings in Figure 2.3-7:
Figure 2: Settings for SHMs in form of static images
Figure 2.8-9: The input assay data.
Figure 2.10-12: Select genes (Figure 2.10), click the button (Figure 2.11), then spatial heatmaps will be created (Figure 2.12).
Figure 2.13-15: In the Experiment design (Figure 2.14), reference experiment variables can be uploaded in a one-column table (Figure 2.14a), where multiple variables needs to be formatted in comma-separated strings (Figure 2.14b). These references will be include in the “reference” column (Figure 2.14b). Selecting “Yes” in Figure 2.13a will computes relative expression levels, which can be toggled by “No”. For example, after selecting “Yes” (Figure 2.13), relative expression levels in brain under “DBA.2J” will be computed based on “C57BL” and “CD1”, respectively. Selecting “Yes” and “No” in Figure 2.13b will computes log2-transformed and raw relative expression levels, respectively, such as log2(treatment/control) and treatment/control.
Figure 3: Settings for SHMs in form of interactive images
Figure 3.1-3: Go to tabs displaying SHMs in form of interactive images (top red line: current selected tab).
Figure 3.4: Click the “Run” button to show the interactive images.
Figure 3.5: Click the “Play” button to show images sequentially.
The spatial enrichment module identifies spatially enriched or depleted genes that are significantly up- or down- regulated in one feature (e.g. tissue) relative to reference features, and their abundance values are visualized as enrichment SHMs. Similarly, genes enriched or depleted in one experimental variable (e.g. treatments) relative to reference variables can also be detected and visualized.
Quick start: Click “Run” (Figure 5.2.1) to perform spatial enrichment for each selected spatial features (Figure 5.2.3), select a query feature (Figure 5.3.1) to get corresponding results (Figure 5.4.2), click Figure 5.4.1 to create enrichment SHMs.
Figure 5.1: Go to the tab displaying spatial enrichment (top red line: current selected tab).
Figure 5.2: Perform spatial enrichment according to the settings:
The input data are pre-processed: genes with expression values over a cutoff (Figure 5.2.2A) across at least a proportion (Figure 5.2.2P) of samples and coefficient of variance (CV) within a range (Figure 5.2.2CV1, CV2) are retained. Then the assay data are normalized.
Spatial features and experimental variables (e.g. treatments) are listed in 2.3 and 2.4 respectively. Only those chosen will be considered for spatial enrichment. If the comparison (2.5) is across spatial features, variables under the same spatial feature will be treated as replicates, and vice versa.
The stringency of spatial enrichment can be relaxed by allowing a number of outliers (Figure 5.2.6) in reference features. The methods (Figure 5.2.7) for spatial enrichment include differential expression analysis tools of edgeR (McCarthy et al. 2012), limma (Ritchie et al. 2015), DESeq2 (Love, Huber, and Anders 2014), and distinct (Tiberi et al., n.d.). The top up- or down-regulated genes can be selected by log2-fold change (e.g. \(\geq\) 1) and FDR (Figure 5.2.8, e.g. \(\leq\) 0.05).
By clicking “Run” (Figure 5.2.1), all-against-all comparisons will be performed according to these settings.
Figure 5: Spatial enrichment
Figure 5.3: Query the results
Figure 5.4: Results of the query feature
Although SHMs are powerful for visualization, only a few genes can be plotted simultaneously as each requires an individual plot. To overcome this limitation and support analysis routines involving a large number of genes, the Shiny App integrates functionalities for large-scale data mining, including hierarchical clustering, K-means clustering, and network analysis (Figure 6).
Quick start: Click “Run” (Figure 6.3.2) to identfify the cluster containing the query gene (Figure 6.2.1) chosen from SHMs.
Figure 6.1: Go the tab displaying the data mining interface.
Figure 6.2: Step1: To obtain genes showing expression similarity with a query gene chosen from SHMs (Figure 6.2.1), the complete assay data can be subsetted using a similarity measure (Figure 6.2.2) and a cutoff (Figure 6.2.3). The subsetted matrix will be passed to Step2. If no subsetting is applied, the whole matrix will be used in Step2.
Figure 6: Large-scale data mining downstream sptial heatmaps
Figure 6.3: Step2: Select a method (Figure 6.3.1) and click “Run” (Figure 6.3.2), then a cluster or network module showing highly similar expression patterns with the query will be identified in the subsetted matrix from step1, and the results will be shown in Figure 6.3.3A-C respectively.
Network analysis is performed with the WGCNA algorithm (Langfelder and Horvath 2008; Ravasz et al. 2002). The objective is to identify the network module containin the query that can be visualized in form of network graphs. See more details here.
Figure 6.4: Step3: Perform optional further network analysis on the cluster containing the query (Figure 6.3.3A-B) from step2. This tab is disabled until the cluster is shown (Figure 6.3.3A-B).
The co-visualization module provides novel plotting functionalities designed to gain insights into tissue-level organizations of single-cell data, or vice versa cellular compositions of tissues (Figure 7.9.5-9.6). It combines SHMs and embedding plots where matching tissues and cells are associated by identical point colors. The coloring (Figure 7.9.3) of the single cells (dots) and tissue features can be based on quantitative values (heat coloring) or fixed group-based colors. Cell group labels are required for the cell-tissue matching. This includes support for existing cell annotations, marker gene-based methods, manual assignments, and co-clustering of bulk and single-cell data (Figure 8.7).
When using the first four methods, there are often differences in naming conventions between cell group labels and tissue labels, so the user interface for cell labels obtained by these methods utilizes a ranslation map to create a bridge between the cell and tissue labels (Figure 7.7.2-7.4). By contrast, the co-clustering method directly groups cells using source tissue labels, so the cell groups and tissues already have programatically identical labels. Due to this inherent alignment, the user interface for the co-clustering method is designed separately (Figure 8).
This user interface (Figure 7.1-7) is designed for cell group labels from existing cell annotations, marker gene-based methods, manual assignments, etc.
Quick start: Select “Annotation (or other) labels” (Figure 7.2), have an overview on the single-cell data (Figure 7.6), match cells and tissues (Figure 7.7), and click “Run” (Figure 7.7.5) to create co-visualization plots.
Figure 7.1: Go to the tab for co-visualization (red line: current selected tab).
Figure 7.2: Select the source of cell group labels. The option “Annotation (or other) labels” and “Co-clustering” will introcude the interface in Figure 7 and Figure Figure 8 respectively.
Figure 7.3: In the “Cell-to-bulk” option, when choosing the “cell-by-group” coloring option in 9.3, the heat colors will be derived from the single-cell data. Vice versa for the “Bulk-to-cell” option.
Figure 7.4-5: Go to Figure 7.4 to pre-process the bulk and single-cell assay data if needed, which will be provided in tables in Figure 7.5.
Figure 7.6: This tab is designed for exploring the single cell data before going to Figure 7.7. The metadata (colData
slot of SingleCellExperiment
) are provided in Figure 7.6.4. In the embedding plot, single cells are colored according to the chosen group label in Figure 7.6.1. By selecing rows in Figure 7.6.4 and clicking Figure 7.6.2, the selected cells will be highlighted in the embedding plot.
Figure 7: Co-visualizing bulk and single-cell data using annotation (or other) labels
Figure 7.7: After having an understanding of single-cell data in Figure 7.6, click Figure 7.7 to mactch cells and tissue features. By dragging (Figure 7.7.4) one or multiple spatial features (Figure 7.7.2) to the desired cell labels (Figure 7.7.3), the cell-tissue matching will be established for subsequent co-visualization. Then clicking “Run” (Figure 7.7.5) will turn the page to Spatial Heatmap automatically for co-visualization (Figure 7.9).
Figure 7.8: The source of cell group labels (Figure 7.2) and mapping direction (Figure 7.3) is shown in a box for tracking.
Figure 7.9.1-9.2: Go to the tabs/settings for co-visualization.
Figure 7.9.3: Select coloring options for co-visualization plots (Figure 7.9.5-9.6):
Figure 7.9.4-9.6: Single-cell and bulk data are visualized in an embedding plot (Figure 7.9.5) and an SHM (Figure 7.9.6) respectively. In Figure 7.9.5, grey dots represent cells not matched with any tissue feature (Figure 7.7.4). All cell group labels that are matched with tissue features (Figure 7.7.4) are listed in Figure 7.9.4, where options are provided to visualized all (default) or a single group in Figure 7.9.5.
This user interface is designed for co-clustering only.
Quick start: Have an overview on the co-clustering results (Figure 8.6), then click “Co-visualizing” (Figure 8.6.3) to create co-visualization plots.
Figure 8.1-3: Figure 8.1-3 are the same as Figure 7. Select “Co-clustering labels” (Figure 8.2) to display the interface for co-clustering (Figure 8).
Figure 8.4-5: The co-clustering workflow (Figure 8.7, see below) is performed according to settings in Figure 8.4. The bulk and single-cell assay data are displayed in Figure 8.5.
Figure 8.6: Before co-visualization, go to this tab to see co-clustering (see below) results in form an embedding plot and a table (Figure 8.6.6). The bulk labels assigned to cells and corresponding similarities (Spearman’s correlation coefficients) are shown in Figure 8.6.4 and Figure 8.6.5 respectively, where “none” denotes no assignments. All (default) or a chosen cluster can be selected (Figure 8.6.2) to show in the embedding plot. Selecting rows in Figure 8.6.6 and clicking Figure 8.6.1 will highlight corresponding cells/tissues in the embedding plot. Clicking Figure 8.6.3 will automatically turn the page to “Spatial Heatmap” for co-visualization, which is the same as Figure 7.9.
Figure 8: Co-visualizing bulk and single-cell data using co-clustering labels
Figure 8.7: Co-clustering illustration:
Although the co-clustering method (Figure 8.7) is generally applicable to various types of data modalities (transcriptome, proteome, metabolome, etc), it is specifically explained using RNA-seq data. Initially, the raw count matrices of bulk and single cells are combined column-wise for joint normalization (Figure 8.7A1) using scater and scran (McCarthy et al. 2017; Lun, McCarthy, and Marioni 2016).
Figure 8.7A: Following separation from the single-cell data, for the bulk data, genes are filtered based on their expression values exceeding a cutoff across a certain proportion of bulk samples, and their coefficient of variance (CV) falls within a range (CV1, CV2). On the other hand, the single-cell data are filtered to include genes with robust expression (\(\geq\) cutoff) across a certain proportion of cells and cells with robust expression across a certain proportion of genes (Figure 8.7A2). Next, the bulk data is subsetted to include the same set of genes as the single-cell data to reduce sparsity in the latter and make these two types of data more comparable (Figure 8.7A3).
Figure 8.7B: In the subsequent step, the bulk and single-cell data are combined column-wise for joint embedding using a dimensionality reduction technique (PCA or UMAP).
Figure 8.7C: Co-clustering is then performed on the top joint dimensions. Specifically, a graph is built with methods (buildKNNGraph or buildSNNGraph) from scran where nodes are cells (or tissues) and edges are connections between nearest neighbors (Lun, McCarthy, and Marioni 2016), and subsequently this graph is partitioned with methods (cluster_walktrap, cluster_fast_greedy, or cluster_leading_eigen) from igraph to obtain clusters (Csardi and Nepusz 2006). Three types of clusters are shown: (i) multiple cells are co-clustered and assigned to one bulk tissue sample (Figure 8.7C1); (ii) multiple cells are co-clustered with several bulk tissues, and then assigned to a single bulk tissue with a nearest-neighbor approach (Figure 8.7C2), which is based on the Spearman’s correlation coefficient (similarity, Figure 8.6.5); and (iii) cells that do not co-cluster with any bulk tissue remain unassigned (Figure 8.7C3).
Figure 8.7D-E: After co-clustering, cells are labeled by bulk tissues or remain un-labeled (“none” in Figure 8.7D). Lastly, the obtained labels are subsequently used to match cells with tissues in embedding and SHMs, respectively (Figure 8.7E).