Create K-medoids Clustering for RNA-Seq

In a k-medoids clustering, features are clustered into k separate clusters. The procedure seeks to assign features to clusters such that distances between features of the same cluster are small, while distances between clusters are large.

The output of the tool is a Clustering Collection (Image clustering_collection_16_h_p). The clusters in the Clustering Collection can be viewed together as a Sankey plot (Image sankey_16_n_p) or individually as graphs (Image graph_16_n_p).

To perform a k-medoids clustering:

        Toolbox | RNA-Seq and Small RNA Analysis (Image rna_seq_group_closed_16_n_p)| Expression Plots (Image rna_expression_plots_folder_closed_16_n_p) | Create K-medoids Clustering for RNA-Seq (Image k-means)

Select at least two expression tracks (Image rnaseqtrack_16_h_p), or miRNA expression tables (Image rna_seeds_table_16_n_p)/ (Image annotate_small_rna).

Click Next to display a dialog as shown in figure 33.66.

Image k-medoids_clustering
Figure 33.66: Parameters for k-medoids clustering.

The parameters are:

Genomes usually contain too many features to allow for a meaningful visualization of all genes or transcripts. Clustering hundreds of thousands of features is also very time consuming. Therefore we recommend reducing the number of features before clustering and visualization.

There are several different Filter settings to filter features:

We only recommend using Keep fixed number of features for exploratory analysis. This is because, while the chosen features have the most variable expression among all the samples, the variation may not be of interest: for example, maybe there is a large variability across different time points in a time series, but this is the same in both treatment and control groups.