The prior probabilities are estimated using only the mapped reads through four rounds of Expectation Maximization and are calculated for each potential combination of alleles (site types). Thus, the prior probabilities reflect the likelihood of observing each combination of alleles in the genome studied. The reference sequence is not taken into account during the first part of the analysis. More about the Maximum Likelihood estimation (MLE) can be found at http://en.wikipedia.org/wiki/Maximum_likelihood.
For a diploid organism, the initial parameters for the priors, which are then updated, are shown in Table 31.1. The sum of the probabilities for all site types is always 1.
If the expected ploidy level is set to 1, analogous values to table 31.1 are calculated. Here, only the values for the homozygous site types like A, C, G, T and - would be calculated.
If the expected ploidy is set to 3, the analogous values are calculated, which here would be values for site types like A|A|A, A|C|G, G|G|- and so on.
Error probabilities are calculated alongside the priors for each observed allele and assumed reference allele, before the reference sequence is incorporated into the analysis. Table 31.2 illustrates an example of the values calculated in an error probability matrix.
If quality values are available, an error matrix is calculated for each quality value.