UMI grouping

All of the LightSpeed tools can group reads based on Unique Molecular Identifiers (UMIs). Both protocols where the UMI is present on only one read in a pair or both reads in a pair (duplex UMI) are supported.

The UMI sequence is recorded and removed from the reads before trimming and mapping, or it can be read from the fastq read header. After the reads have been mapped, reads with similar UMI sequence and mapping position are merged into a consensus UMI read.

For duplex UMIs, UMI grouping is a two step process, where reads are first grouped to simplex reads and then to duplex reads.

The consensus is calculated following these rules:

Q-scores are assigned to the bases in the UMI read as follows:

Examples of the resulting UMI read Q-scores are given in figure 2.1.

Image lis_UMI_Q
Figure 2.1: Assigned Q-scores exemplified for various UMI group sizes, base quality scores and base ambiguity among contributing reads.

For limitations in UMI grouping, see Limitations.

Variant annotations

A set of annotations are added to variants that are called from the generated UMI read mapping.

Definitions