Sequence data formats

Note that high-throughput sequencing data formats from Illumina, IonTorrent, 454 and also high-throughput fasta and trace files are imported using a special import as described in Import high-throughput sequencing data. These data can also be exported in fastq format (using NCBI/Sanger Phred quality scores).

File type Suffix Import Export Description
AB1 .ab1 X   Including chromatograms
ABI .abi X   Including chromatograms
CLC .clc X X Rich format including all information
Clone manager .cm5 X   Clone manager sequence format
DNAstrider .str/.strider X X
DS Gene .bsml X  
EMBL .emb/.embl X X Rich information incl. annotations (nucs only)
FASTA .fsa/.fasta X X Simple format, name & description
FASTQ .fastq X X Simple format, name & description
GenBank .gbk/.gb/.gp/.gbff X X Rich information incl. annotations
Gene Construction Kit .gcc X  
Lasergene .pro/.seq X  
Nexus .nxs/.nexus X X
Phred .phd X   Including chromatograms
PIR (NBRF) .pir X X Simple format, name & description
Raw sequence any X   Only sequence (no name)
SCF2 .scf X   Including chromatograms
SCF3 .scf X X Including chromatograms
Sequence Comma separated values .csv X X Simple format. One seq per line: name, description(optional), sequence
Staden .sdn X  
Swiss-Prot .swp X X Rich information incl. annotations (only peptides)
Tab delimited text .txt   X Annotations in tab delimited text format
Vector NTI archives* .ma4/.pa4/.oa4 X   Archives in rich format
Vector NTI Database*   X   Special import full database

*Vector NTI import functionality comes as standard within the CLC Main Workbench and can be installed as a plugin via the Plugins Manager of the CLC Genomics Workbench (read more in Installing plugins).

When exporting in fasta format, it is possible to remove sequence ends covered by annotations of type "Trim" (read more in Trimming).