SAM/BAM/CRAM export format specification
Specifications
The CLC Genomics Workbench aims to import and export SAM and BAM files according to the v1.4-r962 version of the SAM specification (see http://samtools.github.io/hts-specs/SAMv1.pdf), and CRAM files according to the v3.1 version of the CRAM specification (see http://samtools.github.io/hts-specs/CRAMv3.pdf). This appendix describes how the CLC Genomics Workbench exports SAM, BAM and CRAM files, along with known limitations.
General notes about the exporters
The exporters write unsorted SAM/BAM/CRAM files.
Reference names are updated to match the SAM specification:
- All leading and trailing whitespaces are removed.
- Occurrences of disallowed characters according to the specification (whitespaces \ , " ` ' @ () [] <>) are replaced by _ (underscore). Additionally, = and * are only disallowed at the beginning of the reference names. E.g., a reference name
*my=reference@sequence
is exported to_my=reference_sequence
.
The following read group tags are supported: ID, SM, PI and PL. All other read group tags are ignored.
The exporters can also output additional annotations added by tools provided by plugins. Where that is the case, further details are provided in the plugin manual.
Alignment Section
Here are a few remarks on the alignment sections of the exported files:
- Unmapped reads are not exported.
- If pairs are not on the same contig, the mates will be exported as single reads.
- Multi segment mappings will be imported as a paired data set.
- If a read name contains spaces, the spaces are replaced by an underscore '_'.
- The exported CIGAR string uses 'M' to indicate match or mismatch and does not use '=' (equals sign) or 'X'.
- The CLC Genomics Workbench does not support or record mapping quality for read mappings. To fulfill the requirement in the format specifications that a read mapping quality is recorded for all mapped reads, the values 0 and 60 are used when mappings are exported. The value 60 is given to reads that mapped uniquely. The value 0 is given to reads that could map equally well to other locations besides the one being reported in the file.
Optional fields in the alignment section
The following is true for the export of optional fields:
- The NH tag is exported.
- The NM tag is not exported.
- For bisulfite mapped reads, an XR tag is exported with value "CT" or "GA". It describes the read conversion.
- For bisulfite mapped reads, an XG tag is exported with value "CT" or "GA". It describes the reference conversion.
Subsections