Calculation of the prior and error probabilities
The prior probabilities are estimated using only the mapped reads through four rounds of Expectation Maximization and are calculated for each potential combination of alleles (site types). Thus, the the prior probabilities reflect the likelihood of observing each combination of alleles in the genome studied. The reference sequence is not taken into account during the first part of the analysis. More about the Maximum Likelihood estimation (MLE) can be found at http://en.wikipedia.org/wiki/Maximum_likelihood.
For a diploid organism, the initial parameters for the priors, which are then updated, are shown in Table 26.1. The sum of the probabilities for all site types is always 1.
|
Error probabilities are calculated alongside the priors for each observed allele and assumed reference allele, before the reference sequence is incorporated into the analysis. Table 26.2 illustrates an example of the values calculated in an error probability matrix.
|
If quality values are available, an error matrix is calculated for each quality value.