Remove weak edges

The de Bruijn graph is expected to contain artifacts from errors in the data. The number of reads agreeing upon an error is likely to be low especially compared to the number of reads without errors for the same region. When this relative difference is large enough, it's possible to conclude something is an error.

In the remove weak edges phase we consider each node and calculate the number $ c_1$ of edges connected to the node and the number of times $ k_1$ a read is passing through these edges. An average of reads going through an edge is calculated $ avg_1 = k_1 / c_1$ and then the process is repeated using only those edges which have more than or equal $ avg_1$ reads going though it. Let $ c_2$ be the number of edges which meet this requirement and $ k_2$ the number of reads passing through these edges. A second average $ avg_2 = k_2 / c_2$ is used to calculate a limit,

$\displaystyle limit = \frac{\log(avg_2)}{2} + \frac{avg_2}{40}
$

and each edge connected to the node which has less than or equal $ limit$ number of reads passing through it will be removed in this phase.