The clc_unmapped_reads Program
This program extracts the unmapped read sequences from a mapping. They are output in fasta format. By default the only output sequences are the ones that do not match at all. Using the options it is also possible to output the unaligned ends of reads. A minimum length of unmapped sequences can also be specified.
This program is useful for investigating the sequences that were not part of the expected reference sequences used in a previous mapping. Sometimes, performing de novo assembly on these unmapped reads may be useful to determine their source. It could, for example, be mitochondrial DNA or vector sequence contamination. See Options for All Programs for further details.