QIAGEN Bioinformatics Manuals

Create Methylation Level Heat Map

The Create Methylation Level Heat Map tool simultaneously hierarchically clusters samples and features, generating a two dimensional heat map of methylation levels. Each column corresponds to a sample, and each row corresponds to a feature (a single CpG site or a larger target region including multiple CpG sites, e.g. promoter regions).

Calculation of the methylation levels is performed across all CpG sites in a given target. When the coverage of a CpG site is lower than a specified threshold, that site will be considered zero methylated, indicating that it is uninformative. For targets containing multiple CpG sites, only informative sites are considered and the methylation level is averaged across all the informative sites. For targets containing only a single CpG site, the methylation level is considered only for that site.

Clustering of features and samples

Features are clustered according to the similarity of their methylation level profiles over the set of samples. Samples are clustered according to the similarity of their methylation level patterns over the set of features.

The clustering has a tree structure that is generated by:

Letting each feature or sample be a cluster.
Calculating pairwise distances between all clusters.
Joining the two closest clusters into one new cluster.
Iterating 2 to 3 times, until a single cluster, containing all the features or samples, remains.

The tree is drawn such that the distances between clusters are reflected by the lengths of the branches in the tree.

Running the tool

Go to:

Toolbox | Epigenomics Analysis () | Bisulfite Sequencing () | Create Methylation Level Heat Map ()

The tool takes as input methylation level tracks () generated using the Call Methylation Levels tool with the "Report unmethylated cytosines" option selected, as shown in figure 13.4. This option is enabled by default when running the Detect QIAseq Methylation ready-to-use workflow.

For valid comparisons to be made across samples, the inputs must have been generated using the same reference information, i.e. the same reference genome, target regions, etc.

Image methylation_call_settings
Figure 13.4: The "Report unmethylated cytosines" option in Call Methylation Levels should be enabled when generating methylation level tracks for use with Create Methylation Level Heat Map.

In the wizard step shown in figure 13.5, select the target region track containing the CpG sites. These may be single CpGs or larger targets (e.g. promoter regions).

At the bottom of this step, specify the minimum CpG site coverage value. CpG sites with coverage below this will be excluded from the analysis. By default, the value is 30. When only single CpG sites are analyzed, the methylation level of low coverage sites is set to 0.

A distance measure and a cluster linkage method for the hierarchical clustering is also specified here. The distance measure specifies how distances between two features or samples should be calculated. The cluster linkage method specifies how the distance between two clusters, each consisting of a number of features or samples, should be calculated.

Image methyl_heatmap_parameter
Figure 13.5: The core options for Create Methylation Level Heat Map.

There are three kinds of Distance measures:

Euclidean distance. The ordinary distance between two points - the length of the segment connecting them. If $u=(u_1,u_2,\dots, u_n)$ and $v=(v_1,v_2,\dots, v_n)$ , then the Euclidean distance between and is

$\displaystyle \vert u-v\vert = \sqrt{\sum_{i=1}^n (u_i-v_i)^2}.$
1 - Pearson correlation. The Pearson correlation coefficient between two elements and is defined as

$\displaystyle r = \frac{1}{n-1}\sum_{i=1}^n (\frac{x_i-\overline{x}}{s_x})*(\frac{y_i-\overline{y}}{s_y})$
where $\overline{x}/\overline{y}$ is the average of values in and is the sample standard deviation of these values. It takes a value $\in [-1,1]$ . Highly correlated elements have a high absolute value of the Pearson correlation, and elements whose values are un-informative about each other have Pearson correlation 0. Using $1-\vert Pearson correlation\vert$ as distance measure means that elements that are highly correlated will have a short distance between them, and elements that have low correlation will be more distant from each other.
Manhattan distance. The Manhattan distance between two points is the distance measured along axes at right angles. If $u=(u_1,u_2,\dots, u_n)$ and $v=(v_1,v_2,\dots, v_n)$ , then the Manhattan distance between and is

$\displaystyle \vert u-v\vert = \sum_{i=1}^n \vert u_i-v_i\vert.$

The possible cluster linkages are:

Single linkage. The distance between two clusters is computed as the distance between the two closest elements in the two clusters.
Average linkage. The distance between two clusters is computed as the average distance between objects from the first cluster and objects from the second cluster. The averaging is performed over all pairs , where is an object from the first cluster and is an object from the second cluster.
Complete linkage. The distance between two clusters is computed as the maximal object-to-object distance , where comes from the first cluster, and comes from the second cluster. In other words, the distance between two clusters is computed as the distance between the two farthest objects in the two clusters.

Filtering options are specified in the next step, as shown in figure 13.6.

Image methylation_heatmap_filtering
Figure 13.6: The features to include in results can be customized using filtering options.

The Filter settings options are described below. Some require additional information be provided in the sections underneath.

No filtering All features are reported in the outputs.
Filter by statistics Only features meeting specified p-value and fold change thresholds in a differential methylation track you supply are reported in the outputs. Differential methylation tracks can be generated by running Detect Differentially Methylated Regions from the Analyze QIAseq Panel tool, as described in Finding differentially methylated regions.
Fixed number of features Only the specified number of features with the highest index of dispersion (the ratio of the variance to the mean) are reported in the outputs.
Specify features Only the features listed in the "Keep these features" field are included in the outputs. Enter the list of feature names, separated by white-space characters, commas or semi-colons. Note: This option can only be used if names have been defined for the target regions.

Create Methylation Level Heat Map generates two outputs: a heat map and a methylation expression track.

The methylation level heat map

Each row in the heat map corresponds to a feature. Each column corresponds to a sample. The color in the 'th row and 'th column reflects the methylation level of feature in sample . The color scale can be set in the side panel settings. Heat map settings are described further at: http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=_heat_map_view.html.

Image methyl_heatmap_output
Figure 13.7: A methylation level heat map

The methylation expression track

The methylation expression track includes information from all the samples provided as input. It can be viewed as a graphical track () or as a table (), as shown in figure 13.8.

The following information is available for each feature:

Chromosome The chromosome number
Region The position of the target region on the chromosome
Expression value Not relevant for this analysis type. All values in this column are reported as NaN.

The following four columns are provided for each sample, with the relevant sample name appended to the column name.

Total methylated coverage Coverage of all informative methylated CpG sites
Total context coverage Coverage of all informative CpG sites
Total methylation level The coverage of informative methylated CpG sites divided by the coverage of all informative CpG sites
Valid CpG sites The number of strand specific CpG sites included in the target region

Viewing selected features

The heat map and methylation expression track created by Create Methylation Level Heat Map are linked. Selected elements in one of these outputs can be highlighted in the other. Open both outputs, preferably in a split view, and then:

After selecting rows of interest in a heat map, right click and choose Select Names in Other Views from the menu, or
After selecting rows of interest in the methylation expression track table, click on the Select Names in Other Views button.

The selections made in one of the outputs will now be selected in the other.

Image methyl_table_options
Figure 13.8: Methylation level results shown in a table view. After selecting rows in the table, the buttons highlighted can be used to work with the selection in various ways.

Viewing results in context using a track list

Methylation expression tracks can be included in a track list with other relevant tracks, such as read mapping and annotation tracks, as shown in figure 13.9.

Further details about working with track lists can be found at: https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Track_lists.html.

Image methyl_heatmap_tracklist
Figure 13.9: Methylation results in the context of a track list, with the table view of the methylation expression track open in the bottom of the split view.