UMAP for Single Cell

Uniform Manifold Approximation and Projection, UMAP, is a general purpose algorithm for visualizing high dimensional data in 2D or 3D [McInnes et al., 2018]. In the CLC Single Cell Analysis Module, it is one of two ways of constructing a Dimensionality Reduction Plot (Image singlecellplot_16_n_p), with the other being tSNE. The choice between tSNE and UMAP is purely visual - it has no effect on downstream analysis. Therefore it is recommended to use the tool that produces the visualization you prefer.

The UMAP for Single Cell tool can be found in the Toolbox here:

        Toolbox | Single Cell Analysis (Image sc_folder_closed_16_n_p) | Dimensionality Reduction (Image sc_dimensionalityreduction_folder_open_16_n_p) | UMAP for Single Cell (Image create_umap_plot_16_n_p)

The tool takes an Expression Matrix (Image expression_matrix_track_16_n_p) / (Image expr_matrix_spliced_unspliced_16_n_p), or a Peak Count Matrix (Image peak_count_matrix_16_n_p), or both types of matrix as input. Note that when both types of matrices are provided, only cells that are in common to both matrices are used.

UMAP for Single Cell offers options to run dimensionality reduction or feature selection prior to the UMAP algorithm. For details on these options, please see Feature selection and dimensionality reduction. The following additional options are available:

An example output is shown in figure 16.1.

Image umap_output
Figure 16.1: A UMAP visualization of data from [MacParland et al., 2018].

Tuning the visualization

Although reducing Spread and Minimum distance give tighter clusters, they do so in different ways. Therefore it can be useful to try changing both parameters. An example of this is given in figures 16.2-16.4.

Image umap_dist03_spread1
Figure 16.2: UMAP with Minimum distance = 0.3 and Spread = 1. This is the same plot as in figure 16.1, but with clusters overlaid.

Image umap_dist1_spread1
Figure 16.3: UMAP with Minimum distance = 1 and Spread = 1. The overall structure of the clusters is the same as in figure 16.2, but the points are more separated.

Image umap_dist03_spread10
Figure 16.4: UMAP with Minimum distance = 0.3 and Spread = 10. Both points and clusters are more separated than in figure 16.2. Whether this is desirable is likely to depend on the application. For example, it is easier to see that the dark blue, light blue and orange clusters are different cell types, which may help with cell type annotation, but their proximity in the other figures may have indicated a shared developmental lineage, which it is not possible to see here. Note that other clusters, such as the red cluster, are now also split in two, compared to the other figures.