UMI grouping
All of the LightSpeed tools can group reads based on Unique Molecular Identifiers (UMIs). Both protocols where the UMI is present on only one read in a pair or both reads in a pair (duplex UMI) are supported.
The UMI sequence is recorded and removed from the reads before trimming and mapping, or it can be read from the fastq read header. After the reads have been mapped, reads with similar UMI sequence and mapping position are merged into a consensus UMI read.
For duplex UMIs, UMI grouping is a two step process, where reads are first grouped to simplex reads and then to duplex reads.
The consensus is calculated following these rules:
- At conflicting positions, the most common base is included in the consensus read.
- If the conflicting bases are equally represented the consensus can be generated in two ways:
- When one of the bases at the conflicting position is identical to the reference symbol, the reference symbol is included in the consensus read.
- When none of the bases at the conflicting position is identical to the reference symbol, an N is inserted in the consensus read.
For limitations, see Limitations.
Variant annotations
A set of annotations are added to variants that are called from the generated UMI read mapping.- Count (singleton UMI) The number of singleton UMI read pairs supporting the allele.
- Count (big UMI) The number of big UMI read pairs supporting the allele.
- Proportion (singleton UMIs) The fraction of singleton UMI read pairs relative to all UMI read pairs supporting the allele.
- Average size (UMIs) Average number of read pairs per UMI.
- Average size (simplex UMIs) Average number of read pairs per UMI for simplex UMI read pairs. The annotation is only added for duplex UMI protocols.
- Count (duplex UMIs) The number of duplex UMI read pairs supporting the allele. The annotation is only added for duplex UMI protocols.
- Average size (duplex UMIs) Average number of read pairs per UMI for duplex UMI read pairs. The annotation is only added for duplex UMI protocols.
Definitions
- Duplex UMI A protocol where both read 1 and read 2 in a pair contain a UMI. Reads originating from both strands of a DNA fragment can be grouped.
- Singleton UMI read pairs A UMI read pair that is based on only one input read pair.
- Simplex UMI read pairs For duplex protocols, the number of simplex UMI read pairs is provided. Simplex UMI read pairs are UMI read pairs where input reads all originate from the same strand. Singleton UMI read pairs are a subset of the simplex UMI read pairs.
- Duplex UMI read pairs UMI read pairs that are based on input reads from both strands.