Create Methylation Database

The Create Methylation Database tool can create databases for two or three conditions that can be used by the Predict Methylation Profile. The tool is primarily designed for use with the QIAseq Targeted Methyl panel, e.g., "T Cell Infiltration Panel (MHS-202Z)" where Fibroblast, Epithelial and Immune cells can be distinguished, further the tool is useful for creating Tumor/Normal databases for other QIAseq Targeted Methyl Cancer Panels. In addition, it can in principle be used to construct databases for any pure sample conditions where sufficient methylation differences exist.

To start the tool, go to:

        Toolbox | Epigenomics Analysis (Image epigenomics) | Bisulfite Sequencing (Image bisulfite_folder_closed_16_n_p)| Create Methylation Database (Image predict_methyl_db_16_n_p)

In the first dialog, choose pure sample methylation levels tracks produced by the tool Call Methylation Levels for two or three types. It is possible to use multiple tracks for each condition. It is recommended that the Call Methylation Levels tool has been run with the option to Report unmethylated cytosines. Specify the name of each condition matching the tracks selected, see figure 14.10.

Image methyl_db_wizard
Figure 14.10: Wizard step showing selection of pure tracks, naming and setting filters.

Two filter options are available to specify how stringent the selection should be:

Image ternary_plot
Figure 14.11: Ternary plot created in the report when selecting three types of pure samples, Fib, Epi and IC for two different values of "minimum relative methylation difference". A) minimum relative methylation difference = 0.5, so all sites are selected. Note that the middle of the plot is populated and that these sites do not differ in methylation level between types, hence representing non-informative sites."B) A low value of the relative methylation difference illustrating a high difference between the types.

We strongly recommend experimenting with the parameters to identify more optimal settings as these would differ between different experiments.

Click Next. The generated report will be valuable when assessing the constructed database.

The Create Methylation Database algorithm

The tool takes either two or three types of pure cells or conditions each represented by as many samples as wanted (one sample per track). It is recommended that the Call Methylation Levels tool has been run with the option to Report unmethylated cytosines when producing the tracks. The algorithm is constructed around two parameters, one for assessing coverage and one for specifying differences in methylation between the samples. A filtering cascade is used internally by the tool:

The Create Methylation Database report

The tool provides a report with a summary of selected sites. If no sites fulfill the criteria only the summary is available.

In the next section the Average methylation levels are given per pure type category and for the individual tracks. The table is useful for assessing if the categories are evenly matched. The Average methylation level per sample should not differ too much.

In section 3 of the report the range of methylation values are shown for each of the pure input types. It is important that the hypo and hyper methylation levels is about the same within a category such that they can contribute evenly in the selected sites. This will provide the best estimates.

Finally in section 4, either a histogram or a ternary plot, depending on number of input types, illustrates the relative methylation levels across the selected CpG sites for the database. the ternary plot is illustrated in figure 14.11.

The Create Methylation Database database track

The output database track contains information on:

The produced track can be used as the input database for the Predict Methylation Profile tool. Validation can be done by creating mixtures with known amounts of each type.