Introduction

Annotate with GFF File makes it very easy to annotate a sequence with annotations from a GFF (Generic Feature Format) or GTF (Gene Transfer Format) file. A GFF/GTF file does not contain any sequence information, it only contains a list of annotations. You can read more about the formats at http://www.sanger.ac.uk/resources/software/gff/spec.html and http://mblab.wustl.edu/GTF22.html.

There are many different versions of GFF and GTF. We support a big part of the GFF3 definition (see http://www.sequenceontology.org/gff3.shtml), and we also support GTF format as defined at http://mblab.wustl.edu/GTF22.html. In other words, most GFF3 files can be used to annotated sequences using this tool.

The GFF and GTF files can contain various types of annotations. In general, the Annotate with GFF File action adds the annotation in each of the lines in the file to the chosen sequence, at the position or region in which the file specifies that it should go, and with the annotation type, name, description etc. as given in the file. However, special treatment is given to annotations of the types CDS, exon, mRNA, transcript and gene. For these, the following applies:

Note that genes and transcripts are linked by name only (not by position, ID etc). For a comprehensive source of genomic annotation of genes and transcripts, we refer to the Ensembl web site at http://www.ensembl.org/info/data/ftp/index.html. On this page, you can download GTF files that can be used to annotate genomes for use in other analyses in the CLC Genomics Workbench.

This manual will show two examples of how to use the plugin to annotate a genome for the purposes of RNA-Seq analysis in the CLC Genomics Workbench version 6.5.x and earlier.

If you are using the CLC Genomics Workbench and are interested in standard reference genomic data, please also take a look at the Download Genomes tool as described in the CLC Genomics Workbench manual at: http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Download_reference_genome_data.html.