tSNE for Single Cell

t-Distributed Stochastic Neighbor Embedding, tSNE, is a general purpose algorithm for visualizing high dimensional data in 2D or 3D [Maaten and Hinton, 2008]. In the CLC Single Cell Analysis Module, it is one of two ways of constructing a Dimensionality Reduction Plot (Image singlecellplot_16_n_p), with the other being UMAP. The choice between tSNE and UMAP is purely visual - it has no effect on downstream analysis. Therefore it is recommended to use the tool that produces the visualization you prefer.

The tSNE for Single Cell tool can be found in the Toolbox here:

        Toolbox | Single Cell Analysis (Image sc_folder_closed_16_n_p) | Dimensionality Reduction (Image sc_dimensionalityreduction_folder_open_16_n_p) | tSNE for Single Cell (Image create_tsne_plot_16_n_p)

The tool takes an Expression Matrix (Image expression_matrix_track_16_n_p) / (Image expr_matrix_spliced_unspliced_16_n_p), or a Peak Count Matrix (Image peak_count_matrix_16_n_p), or both types of matrix as input. Note that when both types of matrices are provided, only cells that are in common to both matrices are used.

tSNE for Single Cell offers options to run dimensionality reduction or feature selection prior to the tSNE algorithm. For details on these options, please see Feature selection and dimensionality reduction. The following additional options are available:

An example output is shown in figure 16.5. When interpreting tSNE plots, it is important to be aware that the tightness of clusters and distances between them may not reflect the actual intra- and inter-cluster similarities. Some examples of this are provided by [Wattenberg et al., 2016].

Image tsne_output
Figure 16.5: A tSNE visualization of data from [MacParland et al., 2018].

Implementation details

Barnes-Hut tSNE is implemented [Van Der Maaten, 2014]. If dimensionality reduction has been selected, the initial guess at the optimal layout is seeded using PCA and/or LSI (plus a small amount of random variation), and otherwise is uniformly random in the range 0 - 0.0001. The use of dimensionality reduction is recommended, because several authors have reported improved conservation of global structure in tSNE visualizations when PCA initialization is used.

tSNE has several hyperparameters, which are set as follows:

Early exaggeration factor:
$ \alpha = 12$
Learning rate:
$ \nu = \max(200, n / \alpha)$ where $ n$ is the number of cells
Iterations for early exaggeration:
250
Momentum during early exaggeration:
0.5
Momentum for subsequent iterations:
0.8