Updating equations for the prior site type probabilities

We first derive the updating equations for the prior site type probabilities $ f_s, s \in S$. The probability that the site is of type $ t$ given that we observe the nucleotides $ n_1,...,n_k$ in the reads at the site is:


$\displaystyle P(t\vert n_1,...,n_k)$ $\displaystyle =$ $\displaystyle \frac{P(t, n_1,...,n_k)}{\sum_{s \in S} P(s, n_1,...,n_k)}$  
  $\displaystyle =$ $\displaystyle \frac{P(t) P(n_1,...,n_k\vert t)}{\sum_{s \in S} P(s) P(n_1,...,n_k\vert s)}$ (31.4)

Now, for $ P(t)$ we use our current value for $ f_t$, and if we further insert the expression for $ P(n_1,...,n_k\vert t)$ (31.2) we get:

$\displaystyle P(t\vert n_1,...,n_k) = \frac{f_t \prod_{i=1}^k \sum_{N \in \{A, ...
...rod_{i=1}^k \sum_{N \in \{A, C, G, T, -\}}P_s(N) \times e_q(N \rightarrow n_i)}$ (31.5)

We get the updating equation for the prior site type probabilities, $ f_t, t \in S$, from equation 31.5: Let $ h$ index the sites in the alignment ($ h=1,...H$). Given the current values for the set of site frequencies, $ f_t, t \in S$, and the current values for the set of error probabilities, we obtain updated values for the site frequencies, $ f_t^*, t \in S$, by summing the site type probabilities given the data (as given by equation 31.5) across all sites in the alignment:

$\displaystyle f_t^* = \frac{\sum_{h=1}^H \frac{f_t \prod_{i=1}^k \sum_{N \in \{...
...=1}^k \sum_{N \in \{A, C, G, T, -\}}P_s(N) \times e_q(N \rightarrow n_i^h)}}{H}$ (31.6)