Subsections

Read mapping formats

File type Suffix Import Export Description
ACE .ace X X No chromatogram or quality score
AGP .agp/.fa   X Exports scaffolded contigs (see below)
BAM .bam X X Binary Alignment/Map. A compressed representation of SAM format.
CLC .clc X X Rich format including all information
CLC Assembly File .cas X   Output from the CLC Assembly Cell
CRAM .cram X X Compressed Reference-oriented Alignment/Map. A compressed, reference-relative, space-saving representation of SAM format.
SAM .sam X X Sequence Alignment/Map. A tab delimited text format. See notes below.
Mapping coverage .tsv   X Detailed per-base info on coverage (see below)

Import and export of SAM/BAM /CRAM format files

Import of SAM, BAM and CRAM format files is described in Importing SAM, BAM and CRAM mapping files.

The format specification when exporting to SAM, BAM or CRAM format is described in SAM/BAM/CRAM export format specification. Index files can be created as part of BAM and CRAM exports.

AGP export

Sequence lists and read mappings generated by de novo assembly can be exported using the AGP exporter. On export, contigs are split up based on annotations of type Scaffold. These annotations are added when the "Perform scaffolding" option is enabled when assembling paired reads. Contig sequences are exported to a single FASTA format file, with the accompanying AGP format file containing information about how the contigs relate to one another.

AGP export is described further in AGP export.

Export of coverage information from read mappings

Coverage information from read mappings can be exported in a tabular format using Mapping Coverage export. The output contains information on the number of nucleotides aligned to positions of reference sequences. Insertions are also reported, as described below while deletions are reported as reference regions without read coverage. Both stand-alone read mappings and reads tracks can be used as input.

The exported file contains the following columns by default:

Column Description
1 Reference name
2 Reference position
3 Reference sub-position (insertion)
4 Reference symbol
5 Number of As
6 Number of Cs
7 Number of Gs
8 Number of Ts
9 Number of Ns
10 Number of Gaps
11 Total number of reads covering the position

The Reference sub-position column is empty (indicated by a - symbol) when the reference is defined at a given position. In case of an insertion this column contains an index into the insertion (a number between 1 and the length of the insertion) while the Reference symbol column is empty and the Reference position column contains the position of the last reference.

See Export of tables for detailed information about exporting tabular data from the CLC Genomics Workbench.