Structural Variant Caller

The Structural Variant Caller identifies structural variants in read mappings based on evidence from unaligned read ends and coverage information. It builds on the same ideas around unaligned end read signatures as the existing InDels and Structural Variants tool, but to a larger extent relies on statistical reasoning and more refined components for consensus generation, mapping and alignment of the unaligned end sequences.

The tool,

The tool has the following limitations:

The tool processes each chromosome in a genome individually, through several steps:

Breakpoint estimation: The tool looks for unaligned read ends at each chromosome position. Consensus sequences are constructed for the unaligned ends and aligned regions across the reads at a breakpoint (one consensus sequence for the unaligned end and one for the aligned region). The consensus sequence is based on a majority count of k-mers for the unaligned end, while the nucleotide count in each column is used for the aligned region. Breakpoints are labeled either as a 'left' or 'right' breakpoint. This labeling is from the perspective of a deletion, where a left breakpoint is on the left side of a deletion (which means there is a right unaligned end) and a right breakpoint is vice-versa on the right side of the deletion. For WGS applications, the tool makes a probabilistic assessment of how likely the breakpoint is to support a structural variant based on the coverage, the unaligned end read count, and the specified ploidy of the sample.

Coverage and complexity estimation: each chromosome is divided into bins. The tool then calculates the coverage and the complexity of the reference region in each bin.

Resolving structural variants: after breakpoints have been established, different combinations of left and right breakpoints are paired together. For each pair, the unaligned and aligned consensus sequences from one breakpoint are aligned to the other breakpoint. The alignment scores from each possible pairing are then stored in a matrix, and a dynamic programming algorithm is used to identify which breakpoints to pair together. Breakpoints that were not matched in this step are then each used as a single breakpoint to search for additional smaller insertions or deletions inferred from self-mapping evidence (where the unaligned consensus itself maps back to nearby its own location).

Running the Structural Variant Caller tool

To run the Structural Variant Caller tool, go to:

        Toolbox | Resequencing Analysis (Image resequencing) | Variant Detection (Image variant_detection_folder_closed_16_h_p) | Structural Variant Caller (Image structural_variation_detection_16_n_p)

Once the tool wizard has opened (figure 11.15), choose the read mapping you would like to analyze. The Structural Variant Caller tool accepts read mappings as either reads tracks or stand-alone read mappings.

Image advstrucvardet1
Figure 11.15: Select one or several reads tracks or stand-alone read mappings.

In the next wizard step, specify the ploidy and application for the sample you are analyzing (figure 11.16). You can also specify to ignore broken pair reads. Ignoring broken pairs will typically reduce the computational time of the analysis. It may have a negative impact on sensitivity, but may also improve precision, depending on the source of the broken pair reads.

Image structvar_application
Figure 11.16: Set the application parameters for the tool and specify if broken pair reads should be ignored.

In the next steps you are asked to specify filter settings. The settings depend on whether you have specified the whole genome sequencing or the targeted application. The filter settings for the whole genome sequencing application (figure 11.17) are:

Image structvar_wgsfilters
Figure 11.17: Set filters for the whole genome sequencing applications.

For the targeted application the filters are (figure 11.18):

Image structvar_targetedfilters
Figure 11.18: Set filters for the targeted sequencing applications.



Subsections