K-means/medoids clustering

In a k-means or medoids clustering, features are clustered into k separate clusters. The procedures seek to find an assignment of features to clusters, for which the distances between features within the cluster is small, while distances between clusters are large.

        Toolbox | Expression Analysis (Image expressionfolder)| Feature Clustering (Image feature_clustering_folder_closed_16_n_p) | K-means/medoids Clustering (Image k-means)

Select at least two samples ( (Image array) or (Image rnaseq)) or an experiment (Image experiment).

Note! If your data contains many features, the clustering will take very long time and could make your computer unresponsive. It is recommended to perform this analysis on a subset of the data (which also makes it easier to make sense of the clustering). See how to create a sub-experiment in Creating sub-experiment from selection.

Clicking Next will display a dialog as shown in figure 25.44.

Image k-means_step2
Figure 25.44: Parameters for k-means/medoids clustering.

The parameters are:

Clicking Next will display a dialog as shown in figure 25.45.

Image k-means_step3
Figure 25.45: Parameters for k-means/medoids clustering.

At the top, you can choose the Level to use. Choosing 'sample values' means that distances will be calculated using all the individual values of the samples. When 'group means' are chosen, distances are calculated using the group means.

At the bottom, you can select which values to cluster (see Selecting transformed and normalized values for analysis).

Click Finish to start the tool.

The k-means implementation first assigns each feature to a cluster at random. Then, at each iteration, it reassigns features to the centroid of the nearest cluster. During this reassignment, it can happen that one or more of the clusters becomes empty, explaining why the final number of clusters might be smaller than the one specified in "number of partitions". Note that the initial assignment of features to clusters is random, so results can differ when the algorithm is run again.



Subsections