How the variant detection tools work

Each variant detection tool operates in a similar fashion, following successive and iterative steps while using common filters to call for variants. Before you start the tool, the wizard will take you through the different filters you can set to define which of the single polymorphims detected should be called as a variant. The following sections describe the individual characteristics and the specific assumptions of the three variant detection tools. The filtering and output options common to the tools are described in detail in Filters and Variant Detectors - the outputs.

The steps of the Variant Detection tools are as follow:

  1. The tool identify all possible variants from either the total input dataset or a subset of it, depending on how the following filters have been set:

    • Reference masking settings select the areas of the mapping that should be inspected for variants.
    • Read filter settings select for the reads that should be considered in the assessment.
    • Count and coverage filters selects for sites meeting coverage, frequency and absolute count requirements set for the analysis. Note that only the half of the cut-off values for these parameters are used in this first step, with the full cut-off values for count and coverage applied in a later phase. This ensures that larger variants existing at particular positions may be considered. Indeed, if the full filter values were applied when considering the single position variants in step 1, then it is possible that some SNVs would not be called, preventing in turn the detection of corresponding MNVs. By postponing filtering using the full value of the cutoffs specified, we minimize the risk of missing the detection of multiple position variants.
    • Noise filters decide of the inclusion of a read considering the quality and neighborhood composition of the area surrounding a potential variant.

  2. In the case of the Fixed Ploidy and Low Frequency Variant Detection tools only, the tool estimates iteratively an error model while detecting potential variants.

    For these variant detection tools, site-specific information is used in estimating error models, which are then used to distinguish true variants from likely sequencing errors. Potential single nucleotide variant will be only be kept if the model containing the variant is significantly better than the model without the variant. Full details for the Fixed Ploidy Variant Detection tool are given in section Fixed Ploidy Variant Detection and for the Low Frequency Variant Detection tool in section Low Frequency Variant Detection.

  3. The tool checks each position for other features such as read direction, base qualities and so on using the cut-off values specified in the Noise filters (Filters).

  4. The tool checks for complex variants by taking the single position variants identified in the steps above and checking if neighboring variants are present in the same read. If so, ´the tool 'joins' these SNVs into MNVs, longer insertions or deletions, or into replacements. Note that SNVs are joined only when they are present in the same read as it provides evidence that they occur together in the sample.

  5. Finally the tool applies the full cut-off values supplied for the Count and coverage filters to the single and multiple position variants obtained during step 4.