General notes on UMIs

Some NGS library preparation protocols use Unique Molecular Indexes (UMIs) to improve performance by, for example

UMIs are usually located on the reads. The UMIs on the imported reads can be processed by tools delivered by the Biomedical Genomics Analysis plugin.

Various platforms offer the option to remove the UMIs and the information is instead added to the read headers in the fastq file. UMIs are extracted from read headers during import if the header of the first read in the file contains UMI information in one of the following two formats:

The read header must contain exactly one space, between the <UMI> and <read number>. The imported sequences are annotated with the <UMI>. The allowed characters in the <UMI> are A, C, G, T and N. For paired reads, the <UMI> may contain one + (plus sign), separating the UMIs for each read in the pair, in which case the reads are annotated with the concatenated UMIs, i.e. the <UMI> without the +.