Readmapping

Indexing

When provided with a reference genome, LightSpeed first generates a Burrows-Wheeler based index of all the sequences. After the first run, the index is cached and reused on later runs.

Read mapping and read pairs

LightSpeed maps reads to the indexed reference sequence.

Single reads that are part of a paired read are mapped individually in the following steps:

Read pairs that did not map well or were not paired, go through a second round of more thorough seeding:

If there are multiple paired extensions with the highest score, one of the pairs is selected at random and the read pair is reported as non-specific.

The distance at which reads can be considered as pairs is estimated from a subset of the reads. If there is not enough data to estimate the distance, a default insert size of 1-1000 base pairs is used. Read pairs that map within the expected distance of each other are considered pairs, read pairs that map further away from each other are considered broken pairs.

Unaligned ends of read pairs that are specifically mapped are, during read mapping, reattempted aligned by allowing for one mismatch. The mismatch is, however, not accepted on the last position of the read. Note that this process is not carried out in the first and last 10 bases of a chromosome.

The algorithm has been optimized for the typical read length and error profile of Illumina 150 bp paired-end reads.