Resolve repeats with conflicts
In the previous section repeats were resolved without excluding any reads that goes through the window. While this lead to a simpler graph, the graph will still contain artifacts, which have to be removed. The next phase removes most of these errors and is similar to the previous phase:- A node is selected as the initial window
- The border is divided into sets using reads going through the window. If we have multiple sets, the repeat is resolved.
- If the repeat cannot be resolved, the border nodes are divided into sets using reads going through the window where reads containing errors are excluded. If we have multiple sets, the repeat is resolved.
- The window is expanded with nodes if possible and step 2 is repeated.
The algorithm described above is similar to the algorithm used in the previous section, except step 3 where the reads with errors are excluded. This is done by calculating an average where is the number of reads going through the window and is the number of distinct pairs of border nodes having one (or more) of these reads connecting them. A second average is calculated where is the number of reads going through the window having at least or more reads connecting their border nodes and the number of distinct pairs of border nodes having or more reads connecting them. Then, a read between two border nodes B and C is excluded if the number of reads going through B and C is less than or equal to given by
Figure 30.9: A repeat with conflicts.
Figure 30.10: Resolving a repeat with conflicts.