Updating equations for the Multinomial model frequency parameters

Consider a site, $ h$, and let $ {n_i}^h$ be the nucleotide observed in read $ i$ at this site, $ i = 1,...,k$. For each of the Multinomial models that may explain the data at the site we have a number of frequency parameters. For simplicity, we consider the model which states that there are two alleles present at the site, the reference allele, $ y$, and another allele $ x$, and let $ f$ be the frequency parameter for the non-reference allele (hence the frequency of the reference allele, $ f_y$, is $ 1-f$). Models with more alleles are treated in a similar manner.

We want to estimate the parameter for the frequency of the $ x$ allele at the site $ h$, $ f$, by the fraction of true nucleotides that are $ x$ at this site,given the observed data:

$\displaystyle f^*$ $\displaystyle =$ $\displaystyle \frac{\sum_{i=1}^k P(r_i^h = x\vert n_i^h)}{k}.$ (20.20)

To calculate this we use Bayes Theorem on the numerator:


$\displaystyle P(r_i^h = x\vert n_i^h)$ $\displaystyle =$ $\displaystyle \frac{P(r_i^h = x, n_i^h)}{P(n_i^h)}$ (20.21)
  $\displaystyle =$ $\displaystyle \frac{P(x) \times e(x \rightarrow n_i^h)}{P(x) \times e(x \rightarrow n_i^h) + P(y) \times e(y \rightarrow n_i^h)}$ (20.22)
  $\displaystyle =$ $\displaystyle \frac{f \times e(x \rightarrow n_i^h)}{f \times e(x \rightarrow n_i^h) + (1-f) \times e(y \rightarrow n_i^h)}$ (20.23)

Inserting our current values for the frequency parameter $ f$ under the model, and the error rates $ e(x \rightarrow n_i^h)$ and $ e(y \rightarrow n_i^h)$, in 20.22, and further inserting the obtained values in 20.21 gives us updated values for the frequency parameter $ f$.