How the variant detection tools work
Each variant detection tool operates in a similar fashion, following successive and iterative steps while using common filters to call for variants. Before you start the tool, the wizard will take you through the different filters you can set to define which of the single polymorphims detected should be called as a variant. The following sections describe the individual characteristics and the specific assumptions of the three variant detection tools. The filtering and output options common to the tools are described in detail in Filters and Variant Detectors - the outputs.
The steps of the Variant Detection tools are as follow:
- The tool identifies all possible variants from either the total input dataset or a subset of it, depending on how the following filters have been set:
- Reference masking settings select the areas of the mapping that should be inspected for variants. Note that variants extending up to 50 nt beyond a target region will be reported in full. Variants extending more than 50 nt beyond a target region will be trimmed to only include the first 50 nt beyond the target region.
- Read filter settings select for the reads that should be considered in the assessment.
- Count and coverage filters select for sites meeting coverage, frequency and absolute count requirements set for the analysis. Half the value of each parameter is used During the first stage of variant detection, when single position variants are initially being considered. This ensures that multiple position variants, which are built up from the single position variants, are not missed due to too stringent filtering early on. The full values for the cut-offs are applied later during the variant detection process.
- Noise filters specify requirements for a read to be included, considering the quality and neighborhood composition of the area surrounding a potential variant.
- At this stage, for the Fixed Ploidy and Low Frequency Variant Detection tools only, site-specific information is used to iteratively estimate error models. These error models are then used to distinguish true variants from likely sequencing errors. Potential single nucleotide variants are only be kept if the model containing the variant is significantly better than the model without the variant. Full details for the Fixed Ploidy Variant Detection tool are given in section Fixed Ploidy Variant Detection and Low Frequency Variant Detection.
- The tool checks each position for other features such as read direction, base qualities and so on using the cut-off values specified in the Noise filters (see Filters).
- The tool checks for complex variants by taking the single position variants identified in the steps above and checking if neighboring variants are present in the same read. If so, the tool 'joins' these SNVs into MNVs, longer insertions or deletions, or into replacements. Note that SNVs are joined only when they are present in the same read as this provides evidence that the variants appear contiguously in the sample.
- Finally the tool applies the full cut-off values supplied for the Count and coverage filters to the single and multiple position variants obtained during the previous step.