Create UMI Reads from Grouped Reads

The tool Create UMI Reads from Grouped Reads generates a single consensus read, called a UMI read, from reads which have the same UMI, and places the UMI read in a read mapping at the location of the original reads. Therefore, the output of the tool is a read mapping of generated UMI reads.

The tool can be found in the Toolbox here:

        Tools | QIAseq Panel Expert Tools (Image qiaseq_expert_folder_closed_16_n_p) | QIAseq DNA Panel Expert Tools (Image qiaseqv3_folder_open_16_h_p) | Create UMI Reads from Grouped Reads (Image create_umi_from_groups_16_n_p)

In the first dialog (figure 5.39), select a read mapping of the original reads with UMI annotations that was previously handled with the Calculate Unique Molecular Index Groups tool.

Image createsupereads
Figure 5.39: Select a read mapping of the original reads with UMI annotations.

The second dialog of the wizard (figure 5.40) offers the following options:

Image createsupereads2
Figure 5.40: Settings for the Create UMI Reads from Grouped Reads tool.

Click Next to Open or Save the resulting read mapping of UMI reads, i.e., a read mapping of the merged UMI groups. It is also possible to generate a report that will indicate how many reads were ignored and the reason why they were not included in a UMI read. This data will let you verify the found variants, and examine why expected variants were not found.

Consensus nucleotide calculation is performed following the method described in [Hiatt et al., 2013]. The consensus base is chosen so that the posterior probability of the observed read bases is maximized.

In order to maximize the posterior probability of calling a base, i.e.,

Image form1

where Oi is the observed base at a given position, C the base in question, and where all possible bases are summed up in the denominator, e.g. B=A,T,C,G.

Assuming that the prior for observing any base is equal, i.e., P(A)=P(T)=P(C)=P(G), then the posterior probability is:

Image form2

And by assuming each read base observation is independent,

Image form3

To obtain the consensus base we only need to maximize the numerator.

The Q-score is now simply the probability of making a wrong call, i.e.

Image form4

which means that the Q-score is

Image form5

Q-scores are capped at 60.