Read Mapping

There are two programs within the CLC Assembly Cell for mapping reads to a reference sequence, or reference sequences:
clc_mapper
for mapping in base space and
clc_mapper_legacy
for mapping in color space.

The aim of both programs is the same: to map reads to the area of a reference sequence that they are likely to have originated from. In both cases, the alignment quality threshold is given as a certain fraction of the read that must match in a certain fraction of its positions. E.g., the threshold may be set at 90% identity over 50% of the read length. A gapped alignment is always performed.

By default, read mapping is done with local alignment of reads to a set of reference sequences. The advantage of performing local alignment rather than global alignment is that the ends are automatically removed if there are sufficiently many sequencing errors in those regions. This can also be beneficial if the ends of the reads contain vector contamination or adapter sequences.

An option exists to run global alignment instead of local alignment if this is desired.

In cases where memory consumption is an issue the

clc_mapper_legacy
can be used for base space mapping as it has a scalable memory consumption. However, we recommend that the
clc_mapper
is used for base space mapping when possible as it has better performance in terms of both quality and speed.



Subsections