Annotate with GFF/GTF/GVF file

Use Annotate with GFF/GTF/GVF file to add annotations from a GFF3, GTF or GVF file onto a sequence, or sequences in a sequence list. The names in the first column in the file must match the names of the sequences to be annotated. If this is not the case, either the names in the annotation file, or the names of the sequences, must be updated.

Tools are available for renaming sequences or sequences in sequence lists:

See http://gmod.org/wiki/GFF3 for information about the GFF3 format and http://mblab.wustl.edu/GTF22.html for information on the GTF format.

How annotations are applied

Annotations from each line in the annotation file are placed on the sequence with the name given in the first column. Special treatment is given to annotations of the types CDS, exon, mRNA, transcript and gene. For these, the following applies:

Note that genes and transcripts are linked by name only (not by position, ID etc).

Running the tool

To run the Annotate with GFF/GTF/GVF file tool, go to:

        Toolbox| General Sequence Analysis (Image generalsequenceanalyses)| Annotate with GFF/GTF/GVF file (Image add_annotation_button)

After selecting the sequence to annotate, the next step will look like that shown in figure 16.1.

Image annotatewithgff
Figure 16.1: Select a GFF, GTF or GVR file by clicking on the Browse button.

Click on Browse to select a GFF, GTF or GVF file. After working through handling options, described below, your sequences will be annotated by the information from that file.

Name handling

Annotations are named in the following, prioritized way:

  1. If one of the following qualifiers are present, it will be used for naming (prioritized):
    1. Name
    2. Gene_name
    3. Gene_ID
    4. Locus_tag
    5. ID
  2. If none of these are found, the annotation type will be used as name.
You can overrule this naming convention by choosing Replace all annotation names with this qualifier and specifying another qualifier (see figure 16.2).

If you provide a qualiifer, it must be written identically to the corresponding qualifier name in the annotation file.

Transcript annotations are handled separately, since they inherit the name from the gene annotation.

Image annotatewithgff
Figure 16.2: You can choose Replace all annotation names with the specified qualifier.

Type handling

You can overrule feature types in the annotation file by choosing Replace all annotation types with and specifying a type to use.

Ignore duplicate annotation

When the Ignore duplicate annotation option is checked, only one instance of duplicate annotations will be added to the sequence.

Create log

In the Result handling section of the wizard, check the Create log box results to create a log that includes information like the number of annotations found and if there are any that are could not be placed on the sequence. This information can help with troubleshooting when annotations are not added to a sequence when they were expected to be.