Create UMI Reads
The tool Create UMI Reads generates a single consensus read, called a UMI read, from reads which have the same UMI, and places the UMI read in a read mapping at the location of the original reads. Therefore, the output of the tool is a read mapping of generated UMI reads.
The tool can be found in the Toolbox here:
Tools | QIAseq Panel Expert Tools | QIAseq DNA Panel Expert Tools () | Create UMI Reads ()
In the first dialog (figure 3.16), select a read mapping of the original reads with UMI annotations that was previously handled with the Calculate Unique Molecular Index Groups tool.
Figure 3.16: Select a read mapping of the original reads with UMI annotations.
The second dialog of the wizard (figure 3.17) offers the following options.
Figure 3.17: Settings for the Create UMI Reads tool.
- UMI read creation
- Minimum group size: The tool will only create a UMI read if the number of reads in the UMI is at least "Minimum group size".
- Non-consensus bases
- Minimum supporting consensus fraction set at 0.6 by default. At each position in the UMI read, the consensus nucleotide is chosen to be the nucleotide with the highest probability of being correct (see the Consensus nucleotide calculation section below). If this probability is higher than "Minimum supporting consensus fraction", a Q score for the consensus nucleotide is calculated. The positions in UMI reads that do not have a consensus nucleotide will be an unaligned end (if it is near the ends of the read), and a base with Q score 0 (if it is in the middle of the read).
- There is a choice between 3 methods of handling non-consensus bases: Remove removes the bases, Keep as unaligned (set by default) keeps the bases as unaligned ends, and Keep as aligned keeps the bases as aligned bases (but with a Q-score of 0).
- The last option enables you to Ignore end gaps for the calculation of quality scores: gaps are introduced at the end of raw reads to have them of equal size when building an UMI. This option is disabled by default, meaning that the quality scores at the end of the UMI will be rather low due to the presence of the gaps. Enabling this option will result in quality scores of the consensus bases that are at the end of an UMI read close to the quality scores of the raw reads.
- UMI read filtering
- Minimum UMI read length: UMI shorter than this value will be discarded.
- Minimum average quality score: UMI reads will be discarded, if their average Q-score is lower than "Minimum average quality score".
- Maximum percentage of mismatches in UMI read: UMI reads will be discarded, if more than 50% of the bases are mismatches.
Click Next to Open or Save the resulting read mapping of UMI reads, i.e., a read mapping of the merged UMI groups. It is also possible to generate a report that will indicate how many reads were ignored and the reason why they were not included in a UMI read. This data will let you verify the found variants, and examine why expected variants were not found.
Consensus nucleotide calculation is performed following the method described in Hiatt2013. The consensus base is chosen so that the posterior probability of the observed read bases is maximized.
In order to maximize the posterior probability of calling a base, i.e.,
where Oi is the observed base at a given position, C the base in question, and where all possible bases are summed up in the denominator, e.g. B=A,T,C,G.
Assuming that the prior for observing any base is equal, i.e., P(A)=P(T)=P(C)=P(G), then the posterior probability is:
And by assuming each read base observation is independent,
To obtain the consensus base we only need to maximize the numerator.
The Q-score is now simply the probability of making a wrong call, i.e.
which means that the Q-score is
Q-scores are capped at 60.