QIAGEN Bioinformatics Manuals

Amino Acid Changes

This tool annotates variants with amino acid changes and creates a track for visual inspection of the amino acid changes. It takes a variant track as input and also requires a track with coding regions and a reference sequence.

To add information about amino acid changes to a variant track:

Tools | Resequencing Analysis () | Functional Consequences () | Amino Acid Changes ()

If you are connected to a server, the first wizard step will ask you where you would like to run the analysis. Next, you must provide the variant track to be annotated with amino acid changes (see figure 33.15).

Image aminoacidchanges_step1
Figure 33.15: The Amino Acid Changes annotation tool takes variant tracks as input.

Click Next to go to the next wizard step (figure 33.16).

Image refiner-aminoacid
Figure 33.16: Select CDS, mRNA, and sequence track and choose whether or not you would like to filter away synonymous variants.

Select CDS track. The CDS track is used to determine the reading frame and exon location to be used for translation. If you do not already have CDS, mRNA, and sequence tracks in the Workbench, you can download it with the Reference Data Manager found in the upper right corner of the Workbench.
Select mRNA track (optional). The mRNA track is used to determine whether the variant is inside or outside the region covered by the transcript. Without an mRNA track, variants found outside the CDS region will not be annotated. When specifying an mRNA track, the tool will annotate variants that are located in the mRNA, but also outside the region covering the coding sequence in cases where such variants have been detected.
Use transcript priorities: Check this option if you have provided an mRNA track that includes a "Priority" column, i.e. an integer value where "1" is higher priority than "2". When adding c. and p. annotations:
1. Transcripts with changes in exons are preferred, then transcripts with changes in gene flanking regions, and finally transcripts with changes in introns. This means that, for example, a priority "2" transcript with exon changes is preferred over a priority "1" transcript with intron changes.
2. If there are several transcripts with exon changes, for example, then only the annotation from the highest priority transcript intersecting with the variant will be added.
3. In cases where two or more genes overlap a variant, the highest priority transcript(s) will be reported from each gene.
4. Transcripts without any priority are ignored.
Note that a track with prioritized transcripts can be generated by modifying a gtf/gff file to add a "Priority" column.
Select sequence track.
Variant location. In VCF standard, variants with ambiguous positions are left-aligned, while HGVS standard places ambiguous variants most 3' relative to the transcript annotation. Checking the option Move variants from VCF location to HGVS location will output a track where ambiguous variants will be located following the HGVS standard, even when it moves the variant accross intron/exon boundaries and flanking regions. This option is recommended when comparing variants with databases following the HGVS standard.
This option does not affect the amino acid annotations added by the tool, as they always comply with the HGVS standard. Do, therefore, note that when "Move variants from VCF location to HGVS location" is unticked, variants with ambiguous positions will have the VCF standard position as the variant position, but the HGVS standard position in the annotation.
Also note that enabling this option may double some variants, for example in cases where a variant is overlapped by two genes - one on each strand - or overlapped by one gene and the flanking region of another on the other strand. Duplicating the variant ensures that the output contains a correctly positioned variant for each gene.
There are two columns in the variant track that can be used to identify variants that have been duplicated:
- The column shift-original-id is added when the option Move variants from VCF location to HGVS location is checked. The column contains an id for each original variant. If a variant has been duplicated, each of the resulting variants will have the same id.
- The column Start by 3' rule is always added to variant tracks. The column lists 3' variant positions for all variants. If two values are present, the variant will be duplicated if moved to HGVS location.
Note that when variants are duplicated, the variant track may no longer describe the sample genome accurately, and this option is therefore not recommended if the track will be exported to VCF or uploaded to QCI Interpret.
- When duplicated variants that do not overlap are exported to VCF, both will be included.
- When duplicated variants that overlap are exported to VCF:
  - Both will be included when the option "Enforce ploidy" is not enabled.
  - When "Enforce ploidy" is enabled, only the number of specified alleles will be included in the VCF. If the reference allele is more frequent than the variant, and maximum ploidy is two, only reference alleles will be included in the VCF.
Flanking. It is possible to add c. annotations (HGVS DNA-level) to upstream and downstream flanking positions if they are within a certain distance from the transcript boundaries. The distance can be configured but the default distances are set to 5 kb upstream and 3 kb downstream.
Filtering and annotation.
- Filter away synonymous variants removes variants that do not cause any change to the encoded amino acids from the variant track output.
- Filter away CDS regions with no variants removes CDS regions that have no variants from the amino acid track output.
- Use one letter codon code gives one letter amino acid codes in the columns 'Amino acid change' and 'Amino acid change in the longest transcript' in the variant track output. When this option is not checked, three letter amino acid codes are used.
- Genetic code is the code that is used for amino acid translation (see https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi). The default option is "1 standard", the vertebrate standard code. If relevant, you can use the drop-down list to change to the genetic code that applies to you organism.

Click Next, choose whether you would like to Open or Save the results and click on the button labeled Finish.

Two types of outputs are generated:

A variant track that has been annotated with the amino acid changes. The added information can be accessed via the tooltips in the variant track or in the extra columns that have been added to the variant table. The extra columns provide information about the amino acid changes (see https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi). The variant track opens in track view and the table view can be accessed by clicking on the table icon found in the lower left corner of the View Area.
An amino acid track that displays a graphical presentation of the amino acid changes. The track is based on the CDS track and in addition to the amino acid sequence of the coding sequence, all amino acids that have been affected by variants are shown as individual amino acids below the amino acid track. Changes causing a frameshift are symbolized with two arrow heads, and variants causing premature stop are marked with an asterisk. An example is shown in figure 33.17. The information on the individual amino acids is displayed when the box is wide enough (three bases). This is typically the case -- if not, the side panel settings can be adjusted accordingly (e.g by decreasing the "hide insertions below (%)" value). The information is always displayed in the tooltip on the box.

Image amino_acid_changes_frameshiftstop
Figure 33.17: The variant track and the amino acid track is here presented together with the reference sequence and the mRNA and CDS tracks. An insertion (purple arrow) has caused a frameshift (red arrow) that has changed an alanine to a stop codon (blue arrow).

For each variant in the input track, the following information is added:

Coding region change. This describes the relative position on the coding DNA level, using the nomenclature proposed at https://hgvs-nomenclature.org/stable/. Variants outside exons and in the untranslated regions of the transcript will also be annotated with the distance to the nearest exon. E.g. "c.-4A>C" describes a SNV four bases upstream of the start codon, while "c.*4A>C" describes a SNV four bases downstream of the stop codon.
Amino acid change. This describes the change on the protein level. For example, single amino-acid changes caused by SNVs are listed as p.Gly261Cys, denoting that in the protein sequence (hence the "p.") the Glycine at position 261 is changed into Cysteine. Frame-shifts caused by nucleotide insertions and deletions are listed with the extension fs, for example p.Pro244fs denoting a frameshift at position 244 coding for Proline. For further details about HGVS nomenclature as relates to proteins, please refer to https://hgvs-nomenclature.org/stable/.
Coding region change in longest transcript. When there are many transcript variants for a gene, the coding region change for all transcripts are listed in the "Coding region change" column. For quick reference, the longest transcript is often used, and there is a special column only listing the coding region change for the longest transcript.
Amino acid change in longest transcript. This is similar to the above, just on the protein level.
Other variants within codon. If there are other variants within the same codon, this column will have a "Yes". In this case, it should be manually investigated whether the two variants are linked by reads.
Non-synonymous. Will have a "Yes" if the variant is non-synonymous at the protein level for any transcript. If the filter "Filter synonymous" was applied, this column will only contain entries labeled "Yes". A hyphen, "-", indicates the variant was present outside of a coding region.

Note that variants located at the border of exons are considered intronic (i.e. located between the last intronic and first exonic base or between the last exonic and first intronic base). Amino acid changes will therefore not be determined for these variants.

An example of the output is given in figure 33.18.

Image amino_acid_changes_linked_track_table
Figure 33.18: The resulting amino acid changes in track and table views. When the variant table has been opened by double-clicking on the text found in the left side of the View Area, the variant table and the variant track are linked. When clicking on an entry in the table, this position will be brought into focus in the variant track.

The top track view displays a track list with the reference sequence, mRNA, CDS, variant, and amino acid tracks. The lower table view is the variant table that has been opened from the track list by double-clicking on the variant track name found in the left-hand side of the View Area. When opening the variant table in split view from the track list, the table and the variant track are linked.

An example illustrating a situation where different variants affect the same codon is shown in figure 33.19.

Image aminoacidchanges_threedeletions
Figure 33.19: Amino acids encoded from codons that potentially could have been affected by more than one variant are marked with a hash symbol (#) as the graphically presented amino acid changes always only include a single variant (a SNV, MNV, insertion, or deletion). Shown here are three different variants, present only one at the time, and the consequences of the three individual deletions. In cases where the deletion is found in a codon that is affected with an amino acid change, the arrow also include the deletion (situation 1) in the two other scenarios, the codon containing the deletion is changed to a codon that encodes the same amino acid, and the effect is therefore not seen until in the subsequent amino acid.

In this example three single nucleotide deletions are shown along with the resulting amino acid changes based on scenarios where only one deletion is present at the time. The first affected amino acid is shown for each of the three deletions. As the first deletion affect the encoded amino acid, this amino acid change is shown with a four nucleotide long arrow (that includes the deletion). The other two deletions do not affect the encoded amino acid as the frameshift was "synonymous" at the position encoded by the codon where the deletion was introduced. The effect is first seen at the next amino acid position (763 and 764, respectively), which does not contain a deletion, and therefore is illustrated with a three nucleotide long arrow.

The hash symbol (#) on the changed amino acids symbolize that more than one variant can be present in the region encoding this specific amino acid. The simultaneous presence of multiple variants within the same codon is not predicted by the amino acid changes tool. Manual inspection of the reads is required to be able to detect multiple variants within one codon.

Known limitations

When two genes overlap and an insertion in the form of a duplication occurs, this duplication will be labeled as an insertion.
The Amino Acid Changes tool will not perform flanking checks for exons/CDS that wrap around the chromosome in a circular chromosome.
For some transcripts, there is an error in the underlying reference sequence, leading to a difference between the reference sequence and the transcript sequence as provided in for example RefSeq. The Amino Acid Changes tool does not take reference errors into account when calculating amino acid changes. The results can therefore differ from software that uses the transcript sequence for calculating amino acid changes, such as QCI Interpret.

The amino acid track

The colors of the amino acids in the amino acid track can be changed in the Side Panel under Track layout and "Amino acids track" (see figure 33.20).

Image amino_acid_track_colors
Figure 33.20: The colors of the amino acids can be changed in the Side Panel under "Amino acids track".

Four different color schemes are available under "Amino acid colors":

Gray All amino acids are shown in gray.
Group Colors the amino acids in groups by the following properties:
- Purple neutral, polar
- Turquoise neutral, nonpolar
- Orange acidic, polar
- Blue basic ,polar
- Bright green other (functional properties)
Polarity Colors the amino acids according to the following categories:
- Green neutral, polar
- Black neutral, nonpolar
- Red acidic, polar
- Blue basic ,polar
Rasmol Colors the amino acids according to the Rasmol color scheme.

Browse the manual

Amino Acid Changes

Known limitations

The amino acid track