Manual annotation

This section showcases the different functionalities available from a low-dimensional plot using a UMAP plot of an Expression Matrix (Image expression_matrix_track_16_n_p) generated from data from [MacParland et al., 2018].

The cells in a plot can be colored using different sources of information, such as Cell Clusters, Cell Annotations, features expression and sample of origin. To enable this coloring, the relevant elements are associated with the plot by dragging and dropping them in the corresponding groups of the Side Panel (see figure 17.1).

Image clusters
Figure 17.1: Cells are colored by clusters produced by Cluster Single Cell Data (see Cluster Single Cell Data). The category `Leiden (resolution=0.5)' was chosen in the Side Panel Clusters group.

To visualize the same low-dimensional plot using different sources of information for coloring, the plot can be opened multiple times and the windows can be rearranged by dragging and dropping (see figure 17.2).

Image sidebysideview
Figure 17.2: The same UMAP plot is opened two times, and cells are colored by the automated clusters with `Leiden (resolution=1.0)' (top) and the predicted cell types (bottom) produced by Predict Cell Types (see Predict Cell Type) using the human pre-trained cell classifier (see Reference data management). The label for the selected cell types is added on the plot by choosing `Show labels from: Clusters' in the `Coloring and highlighting' Side Panel group.

On mouse hover, a tooltip shows summaries for the nearby cells (see figure 17.3). The tooltip can be disabled by unchecking `Show tooltip' in the `Coloring and highlighting' group at the bottom of the Side Panel. The same type of summary can be obtained for a group of selected cells by choosing `Show Information for Selected' from the plot right-click menu (see figure 17.4).

Image tool_tip
Figure 17.3: When the mouse hovers over the plot, a tooltip is displayed summarizing the nearby cells, containing information from the elements associated to the plot, and the feature expression for any selected features.

On right-click on the plot, a series of options are available for launching tools or performing various actions on selected cells (see figure 17.4).

Image selection_menu
Figure 17.4: Available options in the plot right-click menu.

Selecting cells

Cells can be selected in multiple ways:

Image lasso
Figure 17.5: Cells can be selected using the lasso tool.

Working with clusters and annotations

Selected cells can be reassigned to existing clusters by choosing `Add to Cluster' or added to a completely new cluster by choosing `Add to New Cluster' (see figure 17.6) from the plot right-click menu (see figure 17.4), either as a free text or a cell type from the QIAGEN Cell Ontology (see The QIAGEN Cell Ontology). When reassigning to existing clusters, the clusters are listed in the same order as in the Side Panel (see figure 17.8).

Image new_cluster
Figure 17.6: Cells can be added to a new cluster defined either by using free text or a cell type from the QIAGEN Cell Ontology by choosing `QIAGEN cell type'. Clicking on the browse button ('...') opens up the ontology browser (see figure 17.7).

Image new_cluster_ontology
Figure 17.7: View of the QIAGEN Cell Ontology, showing details for `hepatic stellate cells'. The `Cell type' text field can be used to quickly identify the desired cell type. Only cell types specific to a certain tissue can be shown by filling in `Tissue'. By using `Free text', all cell types not containing the given text anywhere in their details are removed from the ontology structure shown at the bottom.

Image clusters_order_by
Figure 17.8: Setting the order of clusters in the Side Panel so the clusters with most cells are listed first.

An existing cluster can be renamed by choosing `Rename Cluster' in the cluster edit menu (see figure 17.9). A dialog similar to that in figure 17.6 opens, where the cluster can be renamed either using free text or a cell type from the QIAGEN Cell Ontology. When a cluster represents an ontology cell type, details about it (as those shown in figure 17.7) can be obtained by choosing `Show in QIAGEN Cell Ontology' (see figure 17.9).

Image cluster_options
Figure 17.9: The available options for editing a single cluster from the Side Panel. `Show in QIAGEN Cell Ontology' is grayed out because the corresponding cluster is not part of the ontology.

Any of the changes made using the above actions can be undone using the `Undo' button. When clusters are changed, the plot name is marked with an `*' indicating that it contains an element that needs to be saved. By clicking `Save', a new Cell Clusters element can be created.

The coloring of a cluster can be changed by clicking on the color box next to its name in the Side Panel. The newly chosen color can be saved using the View Settings, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Side_Panel_view_settings.html. Note that when the plot is closed and opened again, the default color is used, and to recover the custom color, the previously saved view settings need be re-applied to the plot.

Cells can be also colored using information from cell annotations (see figures 17.10 and 17.11).

Image annotations
Figure 17.10: Cells are colored by the number of reads from the annotation produced by QC for Single Cell (see QC for Single Cell). What information the cells are colored by can be chosen in the `Coloring and highlighting' group at the bottom of the Side Panel.

Image cell_type_probabilities
Figure 17.11: Cells are colored by the probability of having the type `hepatic stellate cells' from the annotation produced by Predict Cell Types (see Predict Cell Type) using the human pre-trained cell classifier (see Reference data management). Cells with a probability of at least 0.3 are highlighted. The highlight interval is inclusive.

Visualizing expressions

Visualizing the expression of marker genes and selecting cells that express a set of marker genes above a specific value can be used to manually annotate the cell types. The plot can show the full expression of a specific gene across all cells using a gradient (see figure 17.12) or using one color for the cells with one or multiple gene expressions in a specific interval (see figure 17.13).

Image gene_expression
Figure 17.12: Cells are colored by the expression of `ACTA2'. The gene is selected by typing its name in the search box under the `Expression data' group in the Side Panel. The relative coloring of the values can be changed by dragging the two knobs on the white slider above. `ACTA2' can be added to `Selected features' (see figure 17.13).

Image gene_co_expression
Figure 17.13: Cells that have an expression of at least 5 for `ACTA2' and 1 for `COL1A1' are colored in the plot. Different options are available for manipulating the cells expressing a particular feature. Selected cells can be required to express both genes, or just one of them, by choosing `Selected in all (intersection)' or `Selected in any (union)' at the bottom of the Side Panel. Note that the expression range is inclusive: setting the minimum expression to 0 will include the cells not expressing the gene. Choosing the interval [0, 0] will highlight only the cells that do not express the gene.

Multiple genes can be selected by:

Image selected_features_options
Figure 17.14: Side Panel options for `Selected features'.

Image load_from_file
Figure 17.15: Dialog for loading feature names from a file.

Selecting features in other views

Features can be selected from various elements showing their expression, and this is done in a synchronized manner, such that all opened elements showcasing feature expression will highlight the corresponding features, if available. This can be done from an Expression Matrix (see figure 17.16), a Dot Plot (see figure 17.17), a Heat Map (see figure 17.18), or a Violin Plot (see figure 17.19).

Image matrix_select_genes
Figure 17.16: Right-click menu for selecting genes from the Expression Statistics Table view of an Expression Matrix to be synchronized with other views.

Image dot_plot_select_genes
Figure 17.17: Right-click menu for selecting genes from a Dot Plot to be synchronized with other views.

Image heat_map_select_genes
Figure 17.18: Right-click menu for selecting genes from a Heat Map to be synchronized with other views.

Image violin_plot_select_genes
Figure 17.19: Right-click menu for selecting genes from a Violin Plot to be synchronized with other views.

The UMAP plot, Dot Plot, Heat Map, and Violin Plot all show that `ACTA2', `COL1A1', `TAGLN', `COL1A2', `COL3A1', `SPARC', `RBP1', `DCN', `MYL9' are highly expressed in cluster 10. These genes were identified as markers for hepatic stellate cells [MacParland et al., 2018] and cluster 10 is confirmed to contain hepatic stellate cells by the Predict Cell Types tool (see figure 17.11). This can be further confirmed by investigating a Cell Abundance Heat Map (see Create Heat Map for Cell Abundance), as shown in figure 17.20.

Image 2d_heatmap
Figure 17.20: Cell Abundance Heat Map comparing the `Leiden (resolution=0.5)' clusters to the cell types produced by Predict Cell Types. Hovering over a rectangle reveals the abundance of the selected combination.

Once the cells co-expressing specific markers are highlighted, a new cluster with the corresponding cell type can be created, as described above.