Annotation and variant formats
Please note that all of the annotation and variant formats can be imported as tracks (see Import tracks). GFF, GVF and GTF formats can also be imported as annotations on a standard (i.e., non-track) sequence or sequence list using functionality provided by the Annotate with GFF plugin (https://digitalinsights.qiagen.com/plugins/annotate-with-gff-file/).
File type | Suffix | Import | Export | Description |
---|---|---|---|---|
Annotation CSV export | .csv | X | Annotations in csv format | |
Annotation Excel 2010 | .xlsx | X | Annotations in Excel format | |
Annotation Excel 97 - 2007 | .xls | X | Annotations in Excel format | |
BED | .bed | X | X | See Import tracks and BED export |
COSMIC variation database | .tsv | X | Special format for COSMIC data | |
CLC | .clc | X | X | Rich format including all information |
GFF | .gff | X | To import as annotation track, see Import tracks. | |
GFF3 | .gff3 | X | X | See GFF3 format and GFF3 export |
GVF | .gvf | X | X | Special version of GFF for variant data, see Import tracks. |
GTF | .gtf | X | X | Special version of GFF for gene annotation data, see Import tracks. |
UCSC variant database table dump | .txt | X | See Import tracks | |
VCF | .vcf | X | X | See VCF import and Export in VCF |
Wiggle | .wig | X | X | See Import tracks |
Special notes on chromosome names synonyms used during import
When importing annotations as tracks, we try to make things simple for the user by having a set of chromosome names that are recognized as synonyms. The check on the chromosome name comparison is made by looking through the chromosomes in the order in which they are registered in the genome. The first match with any of the synonym names for a given chromosome is the chromosome to which the information will be added.
The synonyms applied are:
For any number N between (including) 1 and 22:
N, chrN, chromosome_N, and NC_00000N are seen as meaning the same thing. As concrete examples:
1 == chr1 == chromosome_1 == NC_000001
22 == chr22 == chromosome_22 == NC_000022
For any number N larger than 23:
N, chrN, chromosome_N are seen as meaning the same thing. As a concrete example:
26 == chr26 == chromsome_26
For chromsome names with letters, not numbers:
X, chrX, and chromosome_X and NC_000023 are synonyms.
Y, chrY, chromosome_Y and NC_000024 are synonyms.
M, MT, chrM, chrMT, chromosome_M, chromosome_MT and NC_001807 are synonyms.
The accession numbers in the listings above (NC_XXXXXX) allow for the matching against NCBI hg19 human reference names against the names used by USCS and vitally, the names used by Ensembl. Thus, in this case, if you have the correct number of chromosomes in a human reference (i.e. 25 references, including the hg19 mitochondria), that set of tracks can be used as the basis for downloading/importing annotations via Download Genomes, for example.
Note: These rules only apply for importing annotations as tracks, whether that is directly or via Download Genomes. Synonyms are not applied when doing BAM imports or when using the Annotate with GFF plugin. There, your reference names in the CLC Genomics Workbench must exactly match the references names used in your BAM file or GFF/GTF/GVF file respectively.