Updating equations for the error rates
Consider a site and a read
. The joint probability of the true nucleotide in the read,
, at the site being
and the observed nucleotide at the site
is:
Using Bayes Theorem, the probability of the true nucleotide in the read, , at the site being
, given that we observe
is:
Inserting 25.16 in 25.17 we get:
The equation 25.18 gives us the probabilities for a given read, , and site,
, given the observed nucleotide
, that the true nucleotide is
,
, given our current values for the frequency
(inserted for
) and error rates. Since we know the sequenced nucleotide in each read at each site, we can get new updated values for the error rate of producing an
nucleotide when the true nucleotide is
,
, for
by summing the probabilities of the true nucleotide being
for all reads across all sites for which the sequenced nucleotide is
, and dividing by the sum of all probabilities of the true nucleotide being a
across all reads and all sites:
