Annotate with GFF/GTF/GVF file

Use Annotate with GFF/GTF/GVF file to add annotations from a GFF3, GTF or GVF file onto a sequence, or sequences in a sequence list. The names in the first column in the file must match the names of the sequences to be annotated. If this is not the case, either the names in the annotation file, or the names of the sequences, must be updated.

Tools are available for renaming sequences or sequences in sequence lists:

See for information about the GFF3 format and for information on the GTF format.

Importing standard reference data

Before proceeding to use Annotate with GFF/GTF/GVF file, please refer to References management. This covers how to download standard reference data, including annotations, provided by QIAGEN and other public sources using the Reference Data Manager, which is part of the Workbench. If the reference data you are interested in is available through the Reference Data Manager, it is usually easier to use that rather than taking the steps outlined in this section.

If the reference data you need is not available through the Reference Data Manager, and you are working with track-based data, please refer to Import tracks, rather than using Annotate with GFF/GTF/GVF file.

How annotations are applied

Annotations from each line in the annotation file are placed on the sequence with the name given in the first column. Special treatment is given to annotations of the types CDS, exon, mRNA, transcript and gene. For these, the following applies:

Note that genes and transcripts are linked by name only (not by position, ID etc).

Running the tool

To run the Annotate with GFF/GTF/GVF file tool, go to:

        Toolbox| Classical Sequence Analysis (Image gene_and_protein_analysis)| Classical Sequence Analysis (Image gene_and_protein_analysis) | General Sequence Analysis (Image generalsequenceanalyses)| Annotate with GFF/GTF/GVF file (Image add_annotation_button)

After selecting the sequence to annotate, the next step will look like that shown in figure 18.1.

Image annotatewithgff
Figure 18.1: Select a GFF, GTF or GVR file by clicking on the Browse button.

Click on Browse to select a GFF, GTF or GVF file. After working through handling options, described below, your sequences will be annotated by the information from that file.

Name handling

Annotations are named in the following, prioritized way:

  1. If one of the following qualifiers are present, it will be used for naming (prioritized):
    1. Name
    2. Gene_name
    3. Gene_ID
    4. Locus_tag
    5. ID
  2. If none of these are found, the annotation type will be used as name.
You can overrule this naming convention by choosing Replace all annotation names with this qualifier and specifying another qualifier (see figure 18.2).

If you provide a qualiifer, it must be written identically to the corresponding qualifier name in the annotation file.

Transcript annotations are handled separately, since they inherit the name from the gene annotation.

Image annotatewithgff
Figure 18.2: You can choose Replace all annotation names with the specified qualifier.

Type handling

You can overrule feature types in the annotation file by choosing Replace all annotation types with and specifying a type to use.

Ignore duplicate annotation

When the Ignore duplicate annotation option is checked, only one instance of duplicate annotations will be added to the sequence.

Create log

In the Result handling section of the wizard, check the Create log box results to create a log that includes information like the number of annotations found and if there are any that are could not be placed on the sequence. This information can help with troubleshooting when annotations are not added to a sequence when they were expected to be.