SAM, BAM and CRAM mapping files

The CLC Genomics Workbench supports import of files in the following alignment map formats for storing large nucleotide sequence alignments:

See https://samtools.github.io/hts-specs/ for specifications for these formats.

Alignments from a SAM/BAM/CRAM file can be imported as a reads track or as stand-alone read mappings. To import the reads in a SAM/BAM file as a sequence list, disregarding any alignment information, please use the Standard Import instead (see Standard import).

Note that importing SAM/BAM/CRAM files can be very memory-consuming for large data sets, and particularly those with many reads marked as members of broken pairs in the mapping.

To launch the SAM/BAM/CRAM importer, go to:

        Import (Image Next_Folder_16_n_p) | SAM/BAM/CRAM Mapping Files (Image ngs_assembly_import).

This opens a dialog where files can be selected and import options specified (figure 7.22).

Image importngsdialog-sam-step1
Figure 7.22: Choosing SAM/BAM/CRAM file and reference(s).

Providing references for the SAM/BAM/CRAM file

The reference sequence(s) that are referred to within the SAM/BAM/CRAM file must be specified in the 'Set parameters' wizard step (figure 7.22):

The table under 'References in files' contains the references that are referred to within the SAM/BAM/CRAM file, with their name, length, and a status. The status indicates whether a given reference referred to within the SAM/BAM/CRAM file is present in the input references. The status can be:

A reference is 'matched' when the status is either OK or Will download. Only reads mapping to a matched reference are imported from SAM and BAM files. Import of CRAM files fails when there are unmatched references.

For references located on a CLC Genomics Server, the table is empty. The importer can be launched, regardless of whether the correct references are selected, but it leads to an error in cases where they are not.

Output options

In the 'Result handling' wizard step, the output options for the importer can be configured (figure 7.23):

Image importngsdialog-sam-step3
Figure 7.23: Result handling.

Only reads mapping to a matched reference are imported from SAM and BAM files. Import of CRAM files fails when there are unmatched references.

For files containing multiple alignment records for a single read, only the primary alignment (see https://samtools.github.io/hts-specs/SAMv1.pdf) is imported.