If the mapping is based on local alignment of the reads, there will be some reads with un-aligned ends (these ends are faded when you look at the mapping). These unaligned ends are not included in the scanning for variants but they are included in the quality filtering (elaborated below).
In figure 26.6, you can see an example with a neighborhood radius of 5. The current position is high-lighted, and the horizontal high-lighting marks the nucleotides considered for a read with the radius set to 5.
For each read and within the given radius,26.1 the following two parameters are used to assess the quality:
- Minimum neighborhood quality. The average quality score of the nucleotides in a read within the specified radius has to exceed this threshold for the base to be included in the calculation for this position (learn more about importing quality scores from different sequencing platforms in Import high-throughput sequencing data).
- Maximum gap and mismatch count. The number of gaps and mismatches allowed within the window length of the read. Note that this is excluding the "mismatch" or gap that is considered a potential variant. If there are more gaps or mismatches than this threshold within the radius, this read will not be included in the variant calculation at this position. Unaligned regions (the faded parts of a read) also count as mismatches, even if some of the bases match.
Figure 26.6 shows an example of a read with a mismatch, marked in dark blue. The mismatch is inside the radius of 5 nucleotides.
Besides looking horizontally within a window for each read, the quality of the central base is also examined: Minimum quality of central base. This is the quality score for the central base, i.e. the bases in the column high-lighted in figure 26.8. Bases with a quality score below this value are not considered in the variant calculation at this position.
In addition to low-quality reads, reads can also be filtered further:
- Ignore non-specific matches
- This will ignore all reads that are marked as non-specific matches (see Non-specific matches). This is generally recommended, since there is no way of knowing whether the reads and thereby the variant are mapped to the correct position.
- Ignore broken pairs
- This will ignore all reads that come from broken pairs (see Mapping paired reads). We recommend to switch on the 'Ignore broken reads' filter in case data included paired-reads. As paired-reads have a larger overall alignment with the reference genome, the alignment is more trustworthy than an alignment with a single read, because the probability that the pair could map somewhere else is lower. However, variants in regions with larger deletions, insertions or rearrangements will be ignored, as broken pairs are often indicators for these kinds of events. Note that if you have mapped a combination of single and paired reads, the reads that were marked as single when running the mapping will still be part of the variant detection, even if you have chosen to ignore broken pairs.
Please note that all the filtering described here means that sometime there is a difference between the coverage of the mapping and the actual counts reported for a variant. The difference would be the number of reads that have been filtered before variant calling.
- ... radius,26.1
- The radius is defined as the number of positions in the local alignment between that particular read and the reference sequence (for de novo assembly it would be the consensus sequence).)