Estimating the parameters in the model for the Fixed Ploidy Variant Detection tool
The Fixed Ploidy Variant Detection tool uses the Expectation Maximization (EM) procedure for estimating the unknown parameters in the model, that is, the prior site type probabilities, and the error rates . The EM procedure is an iterative procedure: it starts with a set of initial prior site type frequencies, and a set of initial error probabilities, . It then iteratively updates first the prior site type frequencies (to get ), then the error probabilities (to get ), then the site type frequencies again, etc. (a total of four rounds), in such a manner that the observed nucleotide patterns at the sites in the alignment become increasingly likely. To give an example of the forces at play in this iteration: as you increase the error rates you will decrease the likelihood of observing 'clean' patterns (e.g. patterns of only s and s at site types ) and increase the likelihood of observing 'noisy' patterns (e.g. patterns of other than only s, and at site types ). If, on the other hand, you decrease the error rates, you will increase the likelihood of observing 'clean' patterns and decrease the likelihood of observing 'noisy' patterns. The EM procedure ensures that the balance between these two is optimized relative to the data observed (assuming, of course, that the ploidy assumption is valid).