Create UMI Reads

The tool Create UMI Reads generates a single consensus read, called a UMI read, from reads which have the same UMI, and places the UMI read in a read mapping at the location of the original reads. Therefore, the output of the tool is a read mapping of generated UMI reads.

The tool can be found in the Toolbox here:

        Tools | QIAseq Panel Expert Tools | QIAseq DNA Panel Expert Tools (Image qiaseqv3_folder_open_16_h_p) | Create UMI Reads (Image create_super_reads_16_h_p)

In the first dialog (figure 3.36), select a read mapping of the original reads with UMI annotations that was previously handled with the Calculate Unique Molecular Index Groups tool.

Image createsupereads
Figure 3.36: Select a read mapping of the original reads with UMI annotations.

The second dialog of the wizard (figure 3.37) offers the following options:

Image createsupereads2
Figure 3.37: Settings for the Create UMI Reads tool.

Click Next to Open or Save the resulting read mapping of UMI reads, i.e., a read mapping of the merged UMI groups. It is also possible to generate a report that will indicate how many reads were ignored and the reason why they were not included in a UMI read. This data will let you verify the found variants, and examine why expected variants were not found.

Consensus nucleotide calculation is performed following the method described in [Hiatt et al., 2013]. The consensus base is chosen so that the posterior probability of the observed read bases is maximized.

In order to maximize the posterior probability of calling a base, i.e.,

Image form1

where Oi is the observed base at a given position, C the base in question, and where all possible bases are summed up in the denominator, e.g. B=A,T,C,G.

Assuming that the prior for observing any base is equal, i.e., P(A)=P(T)=P(C)=P(G), then the posterior probability is:

Image form2

And by assuming each read base observation is independent,

Image form3

To obtain the consensus base we only need to maximize the numerator.

The Q-score is now simply the probability of making a wrong call, i.e.

Image form4

which means that the Q-score is

Image form5

Q-scores are capped at 60.