PCA for RNA-Seq

Principal Component Analysis makes it possible to project a high-dimensional dataset (where the number of dimensions equals the number of genes or transcripts) onto two or three dimensions. This helps in identifying outlying samples for quality control, and gives a feeling for the principal causes of variation in a dataset. The analysis proceeds by transforming a large set of variables (in this case, the counts for each individual gene or transcript) to a smaller set of orthogonal principal components. The first principal component specifies the direction with the largest variability in the data, the second component is the direction with the second largest variation, and so on.

The PCA for RNA-Seq tool clusters samples in 2D or 3D. Known metadata about each sample is added as an overlay. In addition, the following filtering and normalization are performed:

For more detail about these steps, see RNA-seq normalization.

To start the analysis:

        Toolbox | RNA-Seq and Small RNA Analysis (Image expressionfolder)| PCA for RNA-Seq (Image pca)

Select a number of expression tracks (Image rnaseqtrack_16_h_p) and click Next. The tool will generate a PCA plot that can be visualized in 2D and 3D. Note that principal components are available when you export the PCA plot to a tabular format (*.tsv, *.csv, *.xls). The export has a row for each sample (dot in the PCA plot), and columns for the coordinates of that point in PC1, PC2, PC3.