K-medoids clustering is computed using the PAM-algorithm (PAM is short for Partitioning Around Medoids). The PAM-algorithm is based on the search for
data:image/s3,"s3://crabby-images/75add/75addf644866d370a0f750e2bd1486352a9c4f6e" alt="$ k$"
representatives (called medoids) among all elements of the dataset. When having found
data:image/s3,"s3://crabby-images/75add/75addf644866d370a0f750e2bd1486352a9c4f6e" alt="$ k$"
representatives,
data:image/s3,"s3://crabby-images/75add/75addf644866d370a0f750e2bd1486352a9c4f6e" alt="$ k$"
clusters are generated by assigning each element to its nearest medoid. The algorithm first looks for a good initial set of medoids (the BUILD phase). Then it finds a local minimum for the objective function:
where there are
data:image/s3,"s3://crabby-images/75add/75addf644866d370a0f750e2bd1486352a9c4f6e" alt="$ k$"
clusters
data:image/s3,"s3://crabby-images/8b9d4/8b9d43569f609de7098856671d0b38341aa46ad3" alt="$ S_i, i=1,2,\dots, k$"
and
data:image/s3,"s3://crabby-images/6e56a/6e56afd5bf4689e023a5b1761719a8bd0ad8f9ab" alt="$ c_i$"
is the medoid of
data:image/s3,"s3://crabby-images/10009/100099312ff74a8a78a80c1b227d35a31461f582" alt="$ S_i$"
. This solution implies that there is no single switch of an object with a medoid that will decrease the objective (this is called the SWAP phase). The PAM-agorithm is described in [
Kaufman and Rousseeuw, 1990].
Features are z-score normalized prior to clustering: they are rescaled such that the mean expression value over all input samples for the clustering is 0, and the standard deviation is 1.