During library preparation of the samples it is possible to add single or duplex UMI sequences to the reads, which are used towards correcting for sequencing errors and to help improve performance. Addition of UMI is often accompanied by a common sequence prefix that is also added before amplification and which can be very helpful when locating the exact UMI sequence. While the UMI is essential in identifying reads that originate from the same fragment, retaining it as such on the sequenced reads would hinder the subsequent read mapping efficiency and accuracy. Therefore, the Remove and Annotate with Unique Molecular Index tool removes the UMI and the common sequence prefix from the reads, while annotating each read with the UMI to retain the fragment identity as annotation.

In the first dialog, select sequence list(s) (Image seq_list_nucleotide) containing the reads.

In the Settings dialog (figure 5.36), the following options are available:

A report can be generated that contains information about the number of reads processed, and the number and fraction of reads found to have UMIs. It also includes a plot of the nucleotide distribution per position of the UMI barcode.