Cluster Single Cell Data

Cluster Single Cell Data uses a graph-based clustering to automatically cluster cells. Typically the aim is to recover clusters that describe cells of different types or with different behavior.

The tool takes an Expression Matrix (Image expression_matrix_track_16_n_p) as input and produces a Cell Clusters (Image cell_clusters_16_n_p) result. It can be found in the Toolbox here:

        Cell Annotation (Image sc_cell_annotation_folder_open_16_n_p) | Cluster Single Cell Data (Image autocluster_from_matrix_16_n_p)

The tool offers options to run PCA or feature selection prior to clustering. For details on these options, please see PCA and feature selection. The following additional options are available:

The result of clustering is a Cell Clusters (Image cell_clusters_16_n_p) element containing clusters at different resolutions. It is easiest to view these in a Dimensionality Reduction Plot (Image singlecellplot_16_n_p).

Generally speaking, a good clustering will have distinct clusters for each large clump of cells that appears to form a cluster by eye in the Dimensionality Reduction Plot. If this is not the case, the resolution may be too low (as in figure 7.4, compared with figure 7.5). Unfortunately, it can be hard to tell when the resolution is too high, but generally one or more of the clusterings at a default resolution will be suitable for downstream analysis.

Image restoolow
Figure 7.4: Clustering with too low resolution. Clusters that are distinct by eye are given the same color. Examples include the three dark blue clusters at the top-right corner of the plot, and the two turquoise clusters at x=-20. Data is from [MacParland et al., 2018].

Image resbetter
Figure 7.5: A higher resolution clustering of the same data as in figure 7.4. Each cluster that seems distinct by eye is now given its own color. The resolution is no longer too low. It can be difficult to determine whether the resolution is too high.

As the aim of clustering is usually to have clusters that correspond to different cell types, it is possible, from the Dimensionality Reduction Plot, to redraw the boundaries between clusters, to add new clusters, and to rename clusters. These changes might be based on insights from other sources of information such as:



Subsections