Readmapping

Indexing

When provided with a reference genome, LightSpeed first generates a Burrows-Wheeler based index of all the sequences.

Read mapping and read pairs

LightSpeed maps reads to the indexed reference sequence. The quality scores are not stored.

Single reads that are part of a paired read are mapped individually. For each read, only the most likely seeds are extended using a Needleman-Wunsch based method. LightSpeed takes the relative position of individual reads in a pair into account when estimating how likely a seed is, and prefers seeds where the distance between individual reads falls within expected distance of a paired read.

Read pairs that do not map well, go through a second round of more thorough seeding.

The distance at which reads can be considered as pairs, is estimated from a subset of the reads. If there is not enough data to estimate the distance, a default insert size of 1-1000 base pairs is used. Read pairs that map within the expected distance of each other are considered pairs, read pairs that map further away from each other are considered broken pairs.

The algorithm has been optimized for the typical read length and error profile of Illumina 150 bp paired-end reads.