The Annotate RNA Variants tool annotates variants likely to arise from alternative splicing rather than DNA changes. The annotations are useful for downstream filtering.
Three classes of annotations are added "known introns", "splice variants", and "closest exon". These are described briefly below.
6.24. The Annotate RNA Variants tool slides each of these deletions independently to the left and right to see if they can be aligned with an exon boundary without changing the deleted sequence. If this is possible, the deletion is annotated with "Matches known intron = Yes".
To detect such variants, the tool lists all variants within 30nt of each intron. All combinations of these variants are then generated such that for input variants v1, v2, v3, the output combinations would be: [v1], [v2], [v3], [v1,v2], [v1,v3], [v2,v3], [v1,v2,v3]. At most 31 combinations of variants are generated in this way as the number of possible combinations of variants quickly becomes unmanageable. Combinations of variants that are incompatible are not generated. For example if v1 and v2 are overlapping deletions, then combinations [v1,v2] or [v1,v2,v3] are not generated as these are not consistent with a single RNA molecule.
For each combination of variants, the DNA (extracted from the genome) is aligned to the variants + RNA (extracted from mRNA annotations). If the alignment can be made without mismatches, and with only a single gap, then the variants can be explained by a new intron with the coordinates of the gap. All such variants are then annotated with "Possible splice variant = Yes". Additional annotations are added based on the splice signature of the new intron:
- "Conserved splice signatures" with values "Yes" or "No" according to whether the splice signature matches that of the original intron.
- "Canonical splice signature" and zero or more comma-separated values from the list "GT-AG", "GC-AG", "AT-AC"
- "Possible splice signatures" and one or more comma-separated values from the list "AA-AA", "AA-AC" ... "TT-TT"
A variant may be investigated multiple times if it is close to multiple introns. If this happens, the annotations are updated such that "Conserved splice signature" may change from "No" to "Yes", and splice signatures may be added to the "Canonical splice signature" and "Possible splice signatures" lists.
An example is shown in figure 6.26. Here the coverage is 210 at the position where the SNV is called, but >2500 in the adjoining exon. The "real" variant frequency is closer to 16/2500 < 1% than 16/210 > 5%.
The tool adds an annotation "Distance to nearest exon" to all non-reference variants within 100 nt of an exon. The nearest exon is that which returns the smallest positive distance among the following four options: (exonStart - variantStart, exonStart - variantEnd, variantStart-exonEnd, variantEnd-exonEnd).
If a read mapping is provided, the tool also annotates variants with the "Frequency compared to closest exon". This frequency is 100 * count / coverage at nearest exon. The coverage at the nearest exon is actually the coverage at the boundary of the exon. Coverage is calculated per match rather than per read i.e. broken pairs count as 2, pairs count as 1, and single reads count as 1. Coverage also includes ambiguously mapped reads.
In all cases where count > coverage at nearest exon, the tool reports "Frequency compared to closest exon = 100%" which in practice means that this annotation cannot be used to filter the variant. This can happen for a variety of reasons. For example:
- The nearest exon has no coverage - perhaps an exon with a boundary slightly further away from the variant has all the coverage
- Primers positioned in the intron generate more reads than primers in the adjacent exon
The Annotate RNA Variants tool can be found in the Toolbox at:
Tools | QIAseq Panel Expert Tools () | QIAseq RNAscan Panel Expert Tools () | Annotate RNA Variants ()
The tool takes a Variant Track as input (figure 6.27)
In the next dialog figure there are two parameters to set (figure 6.28)
- Reference sequence or read mapping - variants will only be annotated with the coverage at the nearest exon if a read mapping is provided.
- mRNA track
The tool outputs an annotated Variant Track.