Resolve repeats with conflicts
In the previous section repeats were resolved without excluding any reads that goes through the window. While this lead to a simpler graph, the graph will still contain artifacts, which have to be removed. The next phase removes most of these errors and is similar to the previous phase:- A node is selected as the initial window
- The border is divided into sets using reads going through the window. If we have multiple sets, the repeat is resolved.
- If the repeat cannot be resolved, the border nodes are divided into sets using reads going through the window where reads containing errors are excluded. If we have multiple sets, the repeat is resolved.
- The window is expanded with nodes if possible and step 2 is repeated.
The algorithm described above is similar to the algorithm used in the previous section, except step 3 where the reads with errors are excluded. This is done by calculating an average where is the number of reads going through the window and is the number of distinct pairs of border nodes having one (or more) of these reads connecting them. A second average is calculated where is the number of reads going through the window having at least or more reads connecting their border nodes and the number of distinct pairs of border nodes having or more reads connecting them. Then, a read between two border nodes B and C is excluded if the number of reads going through B and C is less than or equal to given by
An example where we resolve a repeat with conflicts is given in 28.9 where we have a total of 21 reads going through the window with , and . Therefore all reads between border nodes B and C are excluded resulting in two sets of border nodes A, C and B, D. The resolved repeat is shown in figure 28.10.
Figure 28.9: A repeat with conflicts.
Figure 28.10: Resolving a repeat with conflicts.