How to run the InDels and Structural Variants tool
To start the structural variant detection:
Toolbox | Resequencing () | InDels and Structural Variants tool ()
This will open up a dialog. Select the read mapping of interest as shown in figure 21.65 and click on the button labeled Next.
Figure 21.65: Select the read mapping of interest.
The next wizard step (Figure 21.66) is concerned with specifying parameters related to the algorithm used for calling structural variants. The algorithm first identifies positions in the mapping(s) with an excess of reads with left (or right) unaligned ends. Once these positions and the consensus sequences of the unaligned ends are determined, the algorithm maps the determined consensus sequences to the reference sequence around other positions with unaligned ends. If mappings are found that are in accordance with a 'signature' of a structural variant, a structural variant is called. For further details about the algorithm see Section 21.20.3.
Figure 21.66: Select the relevant settings.
The 'Significance of unaligned end breakpoints' parameters are concerned with when a position with unaligned ends should be considered by the algorithm, and when it should be ignored:
- P-value threshold: Only positions in which the fraction of reads with unaligned ends is sufficiently high will be considered. The 'P-value threshold' determines the cut-off value in a Binomial Distribution for this fraction. The higher the P-value threshold is set, the more unaligned breakpoints will be identified.
- Maximum number of mismatches: The 'Maximum number of mismatches' parameter determines which reads should be considered when inferring unaligned end breakpoints. Poorly map reads tend to have many mis-matches and unaligned ends, and it may be preferable to let the algorithm ignore reads with too many mis-matches in order to avoid false positives and reduce computational time. On the other hand, if the allowed number of mis-matches is set too low, unaligned end breakpoints in proximities of other variants (e.g. SNVs) may be lost. Again, the higher the number of mis-matches allowed, the more unaligned breakpoints will be identified.
The 'Filter variants' parameters are concerned with the amount of evidence for each structural variant required for it to be called:
- Filter variants: When the Filter variants box is checked, only variants that are inferred by breakpoints that together are supported by at least the specified Minimum number of reads will be called.
'Reference masking' allows specification of target regions:
- Restrict calling to target regions: When specifying a target region track only reads that overlap with at least one of the targets will be examined when the unaligned end breakpoints are identified. Hence only breakpoints that fall within, or in close proximity of, the targets will be identified (a read may overlap a target, but have an unaligned end outside the target - these are also identified and therefore breakpoints outside, but in the proximity of the target). The runtime will be decreased when you specify a target track as compared to when you do not.
Note! As the set of identified unaligned end breakpoints differs between runs where a target region track has been specified and where it has not, the set of predicted InDels and structural variants is also likely to differ. This is because the InDels and structural variants are predicted from the mapping patterns of the unaligned ends at the set of identified breakpoints. This is also the case even if you restrict the comparison to only involve the InDels and structural variants detected within the target regions. You cannot expect these to be exactly the same but you can expect a large overlap.
Specify these settings and click Next. The "Results handling" dialog (Figure 21.67) will be opened. The Indels and Structural variants tool has the following output options:
- Create report When ticked, a report that summarizes information about the inferred breakpoints and variants is created.
- Create breakpoints When ticked, a track containing the detected breakpoints is created.
- Create InDel variants When ticked, a variant track containing the detected InDels that fulfill the requirements for being 'variants' is created. These include the detected insertions for which the allele sequence is inferred, but not those for which it is not, or only partly, known. Also, only deletions of six up to 200 bp are included in the variant track. See Variant tracks for a definition of the requirements for 'variants'. Note that insertions and deletions that are not included in the InDel track, will be present in the 'Structural variants track' (described below).
- Create structural variations When ticked, a track containing the detected structural variants is created.
Figure 21.67: Select output formats.
An example of the output from the InDel and Structural Variant tool is shown in Figure 21.68. The output is described in detail in the next section (Section 21.20.2).
Figure 21.68: Example of the result of an analysis on a standalone read mapping (to the left) and on a reads track (to the right).