The Create Methylation Level Heat Map tool simultaneously hierarchically clusters samples and features, generating a two dimensional heat map of methylation levels. Each column corresponds to a sample, and each row corresponds to a feature (a single CpG site or a larger target region including multiple CpG sites, e.g. promoter regions).
Calculation of the methylation levels is performed across all CpG sites in a given target. When the coverage of a CpG site is lower than a specified threshold, that site will be considered zero methylated, indicating that it is uninformative. For targets containing multiple CpG sites, only informative sites are considered and the methylation level is averaged across all the informative sites. For targets containing only a single CpG site, the methylation level is considered only for that site.
Features are clustered according to the similarity of their methylation level profiles over the set of samples. Samples are clustered according to the similarity of their methylation level patterns over the set of features.
The clustering has a tree structure that is generated by:
- Letting each feature or sample be a cluster.
- Calculating pairwise distances between all clusters.
- Joining the two closest clusters into one new cluster.
- Iterating 2 to 3 times, until a single cluster, containing all the features or samples, remains.
The tree is drawn such that the distances between clusters are reflected by the lengths of the branches in the tree.
Toolbox | Epigenomics Analysis () | Bisulfite Sequencing () | Create Methylation Level Heat Map ()
The tool takes as input methylation level tracks () generated using the Call Methylation Levels tool with the "Report unmethylated cytosines" option selected, as shown in figure 14.4. This option is enabled by default when running the Detect QIAseq Methylation template workflow.
For valid comparisons to be made across samples, the inputs must have been generated using the same reference information, i.e. the same reference genome, target regions, etc.
In the wizard step shown in figure 14.5, select the target region track containing the CpG sites. These may be single CpGs or larger targets (e.g. promoter regions).
At the bottom of this step, specify the minimum CpG site coverage value. CpG sites with coverage below this will be excluded from the analysis. By default, the value is 30. When only single CpG sites are analyzed, the methylation level of low coverage sites is set to 0.
A distance measure and a cluster linkage method for the hierarchical clustering is also specified here. The distance measure specifies how distances between two features or samples should be calculated. The cluster linkage method specifies how the distance between two clusters, each consisting of a number of features or samples, should be calculated.
There are three kinds of Distance measures:
- Euclidean distance. The ordinary distance between two points - the length of the segment connecting them. If
then the Euclidean distance between and is
- 1 - Pearson correlation. The Pearson correlation coefficient between two elements
is defined as
- Manhattan distance. The Manhattan distance between two points is the distance measured along axes at right angles. If
then the Manhattan distance between and is
The possible cluster linkages are:
- Single linkage. The distance between two clusters is computed as the distance between the two closest elements in the two clusters.
- Average linkage. The distance between two clusters is computed as the average distance between objects from the first cluster and objects from the second cluster. The averaging is performed over all pairs , where is an object from the first cluster and is an object from the second cluster.
- Complete linkage. The distance between two clusters is computed as the maximal object-to-object distance , where comes from the first cluster, and comes from the second cluster. In other words, the distance between two clusters is computed as the distance between the two farthest objects in the two clusters.
Filtering options are specified in the next step, as shown in figure 14.6.
The Filter settings options are described below. Some require additional information be provided in the sections underneath.
- No filtering All features are reported in the outputs.
- Filter by statistics Only features meeting specified p-value and fold change thresholds in a differential methylation track you supply are reported in the outputs. Differential methylation tracks can be generated by running Detect Differentially Methylated Regions from the Analyze QIAseq Panel tool, as described in Finding differentially methylated regions.
- Fixed number of features Only the specified number of features with the highest index of dispersion (the ratio of the variance to the mean) are reported in the outputs.
- Specify features Only the features listed in the "Keep these features" field are included in the outputs. Enter the list of feature names, separated by white-space characters, commas or semi-colons. Note: This option can only be used if names have been defined for the target regions.
Create Methylation Level Heat Map generates two outputs: a heat map and a methylation expression track.
Each row in the heat map corresponds to a feature. Each column corresponds to a sample. The color in the 'th row and 'th column reflects the methylation level of feature in sample . The color scale can be set in the side panel settings. Heat map settings are described further at: http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=_heat_map_view.html.
The methylation expression track includes information from all the samples provided as input. It can be viewed as a graphical track () or as a table (), as shown in figure 14.8.
The following information is available for each feature:
- Chromosome The chromosome number
- Region The position of the target region on the chromosome
- Expression value Not relevant for this analysis type. All values in this column are reported as NaN.
The following four columns are provided for each sample, with the relevant sample name appended to the column name.
- Total methylated coverage Coverage of all informative methylated CpG sites
- Total context coverage Coverage of all informative CpG sites
- Total methylation level The coverage of informative methylated CpG sites divided by the coverage of all informative CpG sites
- Valid CpG sites The number of strand specific CpG sites included in the target region
The heat map and methylation expression track created by Create Methylation Level Heat Map are linked. Selected elements in one of these outputs can be highlighted in the other. Open both outputs, preferably in a split view, and then:
- After selecting rows of interest in a heat map, right click and choose Select Names in Other Views from the menu, or
- After selecting rows of interest in the methylation expression track table, click on the Select Names in Other Views button.
The selections made in one of the outputs will now be selected in the other.
Methylation expression tracks can be included in a track list with other relevant tracks, such as read mapping and annotation tracks, as shown in figure 14.9.
Further details about working with track lists can be found at: https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Track_lists.html.