Updating the choice of favored Multinomial model for each site
Given a set of error rates, we can find the maximum likelihood estimates of the underlying frequencies for each possible hypothesis for a given site. This also gives us the maximum likelihood value that can be obtained for the site under that hypothesis. Since the hypotheses with few free parameters are special cases of hypotheses with more free parameters, the hypotheses with the most free parameters will also have the highest likelihoods.
We will only favor a hypothesis with many free parameters if it offers a significantly higher likelihood than a hypothesis with fewer parameters. Let us consider a simple case where we have a hypothesis,  , with no free parameters and an alternative hypothesis,
, with no free parameters and an alternative hypothesis,  , which has one free parameter and contains
, which has one free parameter and contains  as a special case (the hypotheses are nested). We can calculate the log likelihood ratio:
 as a special case (the hypotheses are nested). We can calculate the log likelihood ratio:
 
If this ratio is high, we tend to prefer hypothesis  and if it is low (i.e. close to 1), we prefer
 and if it is low (i.e. close to 1), we prefer  . It turns out that twice the log likelihood ratio is often
. It turns out that twice the log likelihood ratio is often  distributed with a parameter given by the difference between the number of free parameters in the hypotheses,
 distributed with a parameter given by the difference between the number of free parameters in the hypotheses,  . In our example
. In our example  so:
 so:
 
If we write  for the inverse cumulative probability density function for a
 for the inverse cumulative probability density function for a  distribution evaluated at
 distribution evaluated at  , we get a cutoff value for when we prefer
, we get a cutoff value for when we prefer  over
 over  at the significance level given by
 at the significance level given by  .
.
We generalize this to apply to any two Multinomial model hypothesis  and
 and  . For these two, calculate the values (where
. For these two, calculate the values (where  is the degrees of freedom in a hypothesis):
 is the degrees of freedom in a hypothesis):
 
 
(use 
 for zero degrees of freedom). We now prefer the hypothesis with the highest value of
 for zero degrees of freedom). We now prefer the hypothesis with the highest value of  . When comparing a hypothesis with zero free parameters to another hypothesis, we get exactly the same results as with the log likelihood ratio approach.
. When comparing a hypothesis with zero free parameters to another hypothesis, we get exactly the same results as with the log likelihood ratio approach.
We use this approach when comparing the many hypotheses that are present in the low frequency variant caller. For each one we calculate a  value as twice the log likelihood and subtract a cutoff value
 value as twice the log likelihood and subtract a cutoff value  which is based an the
 which is based an the  value and the degrees of freedom for that hypothesis. We then choose the hypothesis with the highest
 value and the degrees of freedom for that hypothesis. We then choose the hypothesis with the highest  as the one that best describes the site in question.
 as the one that best describes the site in question.
For stringent  values (i.e. values close to zero) we tend to prefer hypotheses with few free parameters which means that more sites tend to be called as homozygous.
 values (i.e. values close to zero) we tend to prefer hypotheses with few free parameters which means that more sites tend to be called as homozygous.
The approach used here is similar to the Akaike Information Criteria except that we have introduced a way to use a  value with the comparisons.
 value with the comparisons.

