The Identify Variants (WGS-HD) tool takes trimmed sequencing reads as input and returns identified variants in a Track List.
The tool runs an internal workflow, which starts with mapping the sequencing reads to the human reference sequence. The resulting read mapping is analyzed by the Structural Variant Caller to infer indels and other structural variants from unaligned end read patterns. Subsequently, the mapping is realigned, guided by the indels detected by the Structural Variant Caller. The locally realigned read mapping is analyzed by the Fixed Ploidy Variant Detection tool. The Fixed Ploidy Variant Detection tool produces a track of unfiltered variants; these are post-filtered to remove variants that are likely due to artifacts or noise. The variants called by the Fixed Ploidy Variant Detection tool that pass the post filtering criteria can be found in the Identified variants track. Variants inferred by the Structural Variant Caller, and not detected by the Fixed Ploidy Variant Detection tool, are also subjected to a number of post filters; those that pass the post filter criteria can be found in the Indels indirect evidence track.
A detailed mapping report is created with summaries on the mapping and coverage.
To run the Identify Variants (WGS-HD) workflow, go to:
Toolbox | Template Workflows | Biomedical Workflows () | Whole Genome Sequencing () | Hereditary Disease () | Identify Variants (WGS-HD) ()
- Double-click on the Identify Variants (WGS-HD) tool to start the analysis. If you are connected to a server, you will first be asked where you would like to run the analysis.
- Select the trimmed sequencing reads you want to analyze (figure 20.33).
- In the next dialog, you have to select which reference data set should be used for the analysis (figure 20.34).
- Specify the parameters for the Fixed Ploidy Variant Detection tool (figure 20.35).
The parameters used by the Fixed Ploidy Variant Detection tool can be adjusted. We have optimized the parameters to the individual analyses, but you may want to tweak some of the parameters to fit your particular sequencing data. A good starting point could be to run an analysis with the default settings.
The parameters that can be set are:
- Required variant probability is the minimum probability value of the 'variant site' required for the variant to be called. Note that it is not the minimum value of the probability of the individual variant. For the Fixed Ploidy Variant detector, if a variant site - and not the variant itself - passes the variant probability threshold, then the variant with the highest probability at that site will be reported even if the probability of that particular variant might be less than the threshold. For example if the required variant probability is set to 0.9 then the individual probability of the variant called might be less than 0.9 as long as the probability of the entire variant site is greater than 0.9.
- Ignore broken pairs: When ticked, reads from broken pairs are ignored. Broken pairs may arise for a number of reasons, one being erroneous mapping of the reads. In general, variants based on broken pair reads are likely to be less reliable, so ignoring them may reduce the number of spurious variants called. However, broken pairs may also arise for biological reasons (e.g. due to structural variants) and if they are ignored some true variants may go undetected. Please note that ignored broken pair reads will not be considered for any non-specific match filters.
- Minimum coverage: Only variants in regions covered by at least this many reads are called.
- Minimum count: Only variants that are present in at least this many reads are called.
- Minimum frequency: Only variants that are present at least at the specified frequency (calculated as 'count'/'coverage') are called.
- In the last wizard step you can check the selected settings by clicking on the button labeled Preview All Parameters.
In the Preview All Parameters wizard you can only check the settings, and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows.
- Choose to Save your results and click on the button labeled Finish.
The following outputs are generated:
- Read Mapping () The mapped sequencing reads. The reads are shown in different colors depending on their orientation, whether they are single reads or paired reads, and whether they map unambiguously (see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Coloring_mapped_reads.html).
- Read Mapping Report () The report consists of a number of tables and graphs that in different ways provide information about the mapped reads.
- Two variant tracks (): The Identified Variants track containing the variants identified by the Fixed Ploidy Variant Detection tool after the post-filtering has been applied, and the Indels indirect evidence track which contains the indels inferred by the Structural Variant Caller. When holding the mouse over the detected variants in the Track List, a tooltip appears with information about the individual variants. You will have to zoom in on the variants to be able to see the detailed tooltip.
- Genome Browser View () A collection of tracks presented together. Shows the annotated variant track together with the human reference sequence, genes, transcripts, coding regions, the mapped reads, the identified variants, and the structural variants (see figure 20.5).