Create tSNE Plot

t-Distributed Stochastic Neighbor Embedding, tSNE, is a general purpose algorithm for visualizing high dimensional data in 2D or 3D [Maaten and Hinton, 2008]. In the CLC Single Cell Analysis Module, it is one of two ways of constructing a Dimensionality Reduction Plot (Image singlecellplot_16_n_p), with the other being UMAP. The choice between tSNE and UMAP is purely visual - it has no effect on downstream analysis. Therefore it is recommended to use the tool that produces the visualization you prefer.

The tSNE for Single Cell tool can be found in the Toolbox here:

        Dimensionality Reduction (Image sc_dimensionalityreduction_folder_open_16_n_p) | tSNE for Single Cell (Image create_tsne_plot_16_n_p)

The tool takes an Expression Matrix (Image expression_matrix_track_16_n_p) as input, and offers options to run PCA or feature selection prior to the tSNE algorithm. For details on these options, please see PCA and feature selection. The following additional options are available:

An example output is shown in figure 8.5. When interpreting tSNE plots, it is important to be aware that the tightness of clusters and distances between them may not reflect the actual intra- and inter-cluster similarities. Some examples of this are provided by [Wattenberg et al., 2016].

Image tsne_output
Figure 8.5: A tSNE visualization of data from [MacParland et al., 2018].

Implementation details

Barnes-Hut tSNE is implemented [Van Der Maaten, 2014]. If PCA has been selected, the initial guess at the optimal layout is seeded using PCA (plus a small amount of random variation), and otherwise is uniformly random in the range 0 - 0.0001. The use of PCA is recommended, because several authors have reported improved conservation of global structure in tSNE visualizations when PCA initialization is used.

tSNE has several hyperparameters, which are set as follows:

Early exaggeration factor:
$ \alpha = 12$
Learning rate:
$ \nu = \max(200, n / \alpha)$ where $ n$ is the number of cells
Iterations for early exaggeration:
250
Momentum during early exaggeration:
0.5
Momentum for subsequent iterations:
0.8