Sequence data formats

File type Suffix Import Export Description
AB1 .ab1 X   Including chromatograms
ABI .abi X   Including chromatograms
CLC .clc X X Rich format including all information
Clone manager .cm5 X   Clone manager sequence format
DNAstrider .str/.strider X X
DS Gene .bsml X  
EMBL .emb/.embl X X Rich information incl. annotations (nucs only)
FASTA .fsa/.fasta X X Simple format, name & description
GCG sequence .gcg X X Rich information incl. annotations
GenBank .gbk/.gb/.gp/.gbff X X Rich information incl. annotations
Gene Construction Kit .gck X  
Lasergene .pro/.seq X  
Nexus .nxs/.nexus X X
Phred .phd X   Including chromatograms
PIR (NBRF) .pir X X Simple format, name & description
Raw sequence any X   Only sequence (no name)
SCF2 .scf X   Including chromatograms
SCF3 .scf X X Including chromatograms
Sequence Comma separated values .csv X X Simple format. One seq per line: name, description(optional), sequence
Staden .sdn X  
Swiss-Prot .swp X X Rich information incl. annotations (only peptides)
Tab delimited text .txt   X Annotations in tab delimited text format
Vector NTI archives* .ma4/.pa4/.oa4 X   Archives in rich format
Vector NTI Database*   X   Special import full database

*Vector NTI import functionality comes as standard within the CLC Main Workbench and can be installed as a plugin via the Plugins Manager of the CLC Genomics Workbench (read more in Installing plugins).

When exporting in fasta format, it is possible to remove sequence ends covered by annotations of type "Trim" (read more in Trimming).