Create UMAP Plot

Uniform Manifold Approximation and Projection, UMAP, is a general purpose algorithm for visualizing high dimensional data in 2D or 3D [McInnes et al., 2018]. In the CLC Single Cell Analysis Module, it is one of two ways of constructing a Dimensionality Reduction Plot (Image singlecellplot_16_n_p), with the other being tSNE. The choice between tSNE and UMAP is purely visual - it has no effect on downstream analysis. Therefore it is recommended to use the tool that produces the visualization you prefer.

The UMAP for Single Cell tool can be found in the Toolbox here:

        Dimensionality Reduction (Image sc_dimensionalityreduction_folder_open_16_n_p) | UMAP for Single Cell (Image create_umap_plot_16_n_p)

The tool takes an Expression Matrix (Image expression_matrix_track_16_n_p) as input, and offers options to run PCA or feature selection prior to the UMAP algorithm. For details on these options, please see PCA and feature selection. The following additional options are available:

An example output is shown in figure 8.1.

Image umap_output
Figure 8.1: A UMAP visualization of data from [MacParland et al., 2018].

Tuning the visualization

Although reducing Spread and Minimum distance give tighter clusters, they do so in different ways. Therefore it can be useful to try changing both parameters. An example of this is given in figures 8.2-8.4.

Image umap_dist03_spread1
Figure 8.2: UMAP with Minimum distance = 0.3 and Spread = 1. This is the same plot as in figure 8.1, but with clusters overlaid.

Image umap_dist1_spread1
Figure 8.3: UMAP with Minimum distance = 1 and Spread = 1. The overall structure of the clusters is the same as in figure 8.2, but the points are more separated.

Image umap_dist03_spread10
Figure 8.4: UMAP with Minimum distance = 0.3 and Spread = 10. Both points and clusters are more separated than in figure 8.2. Whether this is desirable is likely to depend on the application. For example, it is easier to see that the dark blue, light blue and orange clusters are different cell types, which may help with cell type annotation, but their proximity in the other figures may have indicated a shared developmental lineage, which it is not possible to see here. Note that other clusters, such as the red cluster, are now also split in two, compared to the other figures.