Annotate with GFF/GTF/GVF file
Use Annotate with GFF/GTF/GVF file to add annotations from a GFF3, GTF or GVF file onto a sequence, or sequences in a sequence list. The names in the first column in the file must match the names of the sequences to be annotated. If this is not the case, either the names in the annotation file, or the names of the sequences, must be updated.
Tools are available for renaming sequences or sequences in sequence lists:
See http://gmod.org/wiki/GFF3 for information about the GFF3 format and http://mblab.wustl.edu/GTF22.html for information on the GTF format.
How annotations are applied
Annotations from each line in the annotation file are placed on the sequence with the name given in the first column. Special treatment is given to annotations of the types CDS, exon, mRNA, transcript and gene. For these, the following applies:
- A gene annotation is generated for each gene_id. The region annotated extends from the leftmost to the rightmost positions of all annotations that have the gene_id (gtf-style).
- CDS annotations that have the same transcriptID are joined to one CDS annotation (gtf-style). Similarly, CDS annotations that have the same parent are joined to one CDS annotation (gff-style).
- If there is more than one exon annotation with the same transcriptID these are joined to one mRNA annotation. If there is only one exon annotation with a particular transcriptID, and no CDS with this transcriptID, a transcript annotation is added instead of the exon annotation (gtf-style).
- Exon annotations that have the same mRNA as parent are joined to one mRNA annotation. Similarly, exon annotations that have the same transcript as parent, are joined to one transcript annotation (gff-style).
Note that genes and transcripts are linked by name only (not by position, ID etc).
Running the tool
To run the Annotate with GFF/GTF/GVF file tool, go to:
Toolbox| General Sequence Analysis ()| Annotate with GFF/GTF/GVF file ()
After selecting the sequence to annotate, the next step will look like that shown in figure 18.1.
Figure 18.1: Select a GFF, GTF or GVR file by clicking on the Browse button.
Click on Browse to select a GFF, GTF or GVF file. After working through handling options, described below, your sequences will be annotated by the information from that file.
Name handling
Annotations are named in the following, prioritized way:
- If one of the following qualifiers are present, it will be used for naming (prioritized):
- Name
- Gene_name
- Gene_ID
- Locus_tag
- ID
- If none of these are found, the annotation type will be used as name.
If you provide a qualiifer, it must be written identically to the corresponding qualifier name in the annotation file.
Transcript annotations are handled separately, since they inherit the name from the gene annotation.
Figure 18.2: You can choose Replace all annotation names with the specified qualifier.
Type handling
You can overrule feature types in the annotation file by choosing Replace all annotation types with and specifying a type to use.
Ignore duplicate annotation
When the Ignore duplicate annotation option is checked, only one instance of duplicate annotations will be added to the sequence.
Create log
In the Result handling section of the wizard, check the Create log box results to create a log that includes information like the number of annotations found and if there are any that are could not be placed on the sequence. This information can help with troubleshooting when annotations are not added to a sequence when they were expected to be.