In the second step of the InDels and Structural Variants detection algorithm the unaligned end 'breakpoint signatures' (identified in step 1) are used to derive 'structural variant signatures'. This is done by:
- Generating a consensus sequence of the reads with unaligned ends at each identified breakpoint.
- Mapping the generated consensus sequences against the reference sequence in the regions around other identified breakpoints ('cross-mapping').
- Mapping the generated consensus sequences of breakpoints that are near each other against each other ('aligning').
- Mapping the generated consensus sequences against the reference sequence in the region around the breakpoint itself ('self-mapping').
- Considering the breakpoints whose unaligned end consensus sequences are found to cross map against each other together, and compare their mapping patterns to the set of theoretically expected 'structural variants signatures' (see here).
- Creating a 'structural variant signature' for each of the groups of breakpoints whose mapping patterns were in accordance with one of the expected 'structural variants signatures'.
The steps above require a number of decisions to be made regarding (1) When is the consensus sequence reliable enough to work with?, and (2) When does an unaligned end map well enough that we will call it a match? The algorithm uses a number of hard-coded values when making those decisions. The values are described below.