Annotate with Repeat and Homopolymer Information
The Annotate with Repeat and Homopolymer Information tool annotates variants with repeat and homopolymer information, based on the variant itself and the genome sequence flanking it.
Homopolymers A variant is considered to be present in a homopolymer region if there are at least 4 consecutive copies of the variant's base type at that location on the reference, or for deletions, next to where the deletion occurred.
Repeats A variant is considered to be in a repeat region if:
- For a 2 bp variant, there are least 4 full copies of that variant at that location on the reference, or for deletions, next to where the deletion occurred.
- For a variant of 3bp or longer, there are at least 3 full copies of that variant at that location on the reference, or for deletions, next to where the deletion occurred.
To determine if there is a homopolymer or repeat in a given reference region, a hidden Markov model (HMM) is used. The HMM will allow for some degree of mismatch between repeated elements on the reference if it determines that the sequence is still most likely to be a homopolymer or repeat. However, even where mismatches between repeated elements on the reference have been allowed, an insertion/replacement will not be marked as being part of a homopolymer or repeat region if there are any mismatches between it and the repeats next to it.
Note: This tool is designed for detecting shorter repeats and potential sequencing errors. Variants longer than 200 bp are therefore not evaluated and will always be marked as not being part of a homopolymer or repeat region.
To run the Annotate with Repeat and Homopolymer Information tool, go to:
Toolbox | Resequencing Analysis () | Variant Annotation () | Annotate with Repeat and Homopolymer Information ()
The tool takes variant tracks () as input.
In the next dialog, the reference sequence the variant track is based on should be selected.
This tool outputs a report, containing a summary of the results, and a variant track with the following annotations added:
- Homopolymer region The value is "Yes" if the variant is an insertion or deletion in a homopolymer region, or "No" if it is not.
- Repeat region The value is "Yes" if the variant is an insertion or deletion in a repeat region, or "No" if it is not.