Input Files

The formats in the following table are recognized as valid input formats by one or more of the CLC Assembly Cell tools. Note that not all listed formats are valid for data to be treated as sequence reads, and not all listed formats are valid for data to be treated as reference sequences in the case of read mappings.

Input file formats are automatically detected by the software through consideration of the file contents. The filename is irrelevant with regards to input format.

Format Reads References
Fasta + +
Fastq + -
Scarf + -
csfasta + -
Sff * + -
GenBank - +
3.1

Read data compressed using gzip is supported as input by the CLC Assembly Cell programs except for clc_remove_duplicates.

Reference data cannot be in a compressed form.



Footnotes

... 3.1
Please note that paired 454 data needs to be pre-processed using the clc_split_reads program.