Consensus nucleotide calculation

The consensus nucleotide calculation is performed following the method described in [Hiatt et al., 2013]. The consensus base is chosen so that the posterior probability of the observed read bases is maximized.

In order to maximize the posterior probability of calling a base, i.e.,

$\displaystyle P(C\vert O_1O_2\ldots O_k) = \frac{P(O_1O_2\ldots O_k\vert C)P(C)...
...c{P(O_1O_2\ldots O_k\vert C)P(C)}{\sum_{x \in B}P(O_1O_2\ldots O_k\vert x)P(x)}$

where Oi is the observed base at a given position, C the base in question, and where all possible bases are summed up in the denominator, e.g. B=A,T,C,G.

Assuming that the prior for observing any base is equal, i.e., P(A)=P(T)=P(C)=P(G), then the posterior probability is:

$\displaystyle P(C\vert O_1O_2\ldots O_k) = \frac{P(O_1O_2\ldots O_k\vert C)}{\sum_{x \in B}P(O_1O_2\ldots O_k\vert x)}$

And by assuming each read base observation is independent,

$\displaystyle P(C\vert O_1O_2\ldots O_k) = \frac{P(O_1\vert C)P(O_2\vert C) \ldots P(O_k\vert C)}{\sum_{x \in B}P(O_1\vert x)P(O_2\vert x) \ldots P(O_k\vert x)}$

To obtain the consensus base we only need to maximize the numerator.



Subsections