SAM/BAM export format specification
SAM Specification
The workbench aims to import and export SAM and BAM files according to the v1.4-r962 version of the SAM specification (see http://samtools.sourceforge.net/SAM1.pdf). This appendix describes how the workbench exports SAM and BAM files along with known limitations.
SAM and BAM Export - General notes
The SAM exporter writes unsorted SAM and BAM files.
Reference names are updated to match the SAM specification:
- All leading and trailing whitespaces are removed.
- Occurrences of disallowed characters according to the specification (whitespaces \ , " ` ' @ = * () [] <>) are replaced by _ (underscore).
The SAM importer and exporter support the ID, SM, PI and PL read group tags. All other read group tags are ignored.
The BAM exporter can also output additional annotations added by tools provided by plugins, and where that is the case, further details are provided in the plugin manual.
SAM Alignment Section
A few remarks on the exported alignment section:
- Unmapped reads are not exported.
- If pairs are not on the same contig, the mates will be exported as single reads.
- Multi segment mappings will be imported as a paired data set.
- If a read name contains spaces, the spaces are replaced by an underscore '_'.
- The exported CIGAR string uses 'M' to indicate match or mismatch and does not use '=' (equals sign) or 'X'.
- CLC software does not support or record mapping quality for read mappings. To fulfill the requirement in the BAM format specifications that a read mapping quality is recorded for all mapped reads, the values 0 and 60 are used when mappings are exported from the Workbench. The value 60 is given to reads that mapped uniquely. The value 0 is given to reads that could map equally well to other locations besides the one being reported in the BAM file.
Optional fields in the alignment section
The following is true for the export of optional fields:
- The NH tag is exported.
- The NM tag is not exported.
- For bisulfite mapped reads, an XR tag is exported with value "CT" or "GA". It describes the read conversion.
- For bisulfite mapped reads, an XG tag is exported with value "CT" or "GA". It describes the reference conversion.
Subsections