QIAGEN Bioinformatics Manuals

Identify Variants (WGS)

The Identify Variants (WGS) tool takes trimmed sequencing reads as input and returns identified variants in a Track List.

The tool runs an internal workflow, which starts with mapping the sequencing reads to the human reference sequence. The resulting read mapping is analyzed by the Structural Variant Caller to infer indels and other structural variants from unaligned end read patterns. Subsequently, the mapping is realigned, guided by the indels detected by the Structural Variant Caller. The locally realigned read mapping is analyzed by the Low Frequency Variant Detection tool. The Low Frequency Variant Detection tool produces a track of unfiltered variants; these are post-filtered to remove variants that are likely due to artifacts or noise. The variants called by the Low Frequency Variant Detection tool that pass the post filtering criteria can be found in the Identified variants track. Variants inferred by the Structural Variant Caller, and not detected by the Low Frequency Variant Detection tool, are also subjected to a number of post filters; those that pass the post filter criteria can be found in the Indels indirect evidence track.

A detailed mapping report is created with summaries on the mapping and coverage.

Run the Identify Variants (WGS) workflow

To run the Identify Variants (WGS) workflow, go to:

Template Workflows | Biomedical Workflows () | Whole Genome Sequencing () | Somatic Cancer () | Identify Variants (WGS) ()

Select the trimmed sequencing reads from the sample that should be analyzed (figure 22.24).

Figure 22.24: Please select trimmed sequencing reads from the sample to be analyzed.
If several samples should be analyzed, the tool has to be run in batch mode. This is done by checking "Batch" and selecting the folder that holds the data you wish to analyze.
In the next dialog, you have to select which reference data set should be used to identify variants (figure 22.25).

Figure 22.25: Choose the relevant reference Data Set to identify variants in your sample.
In the Low Frequency Variant Detection dialog (figure 22.26), you can specify the parameters for variant detection.

Figure 22.26: Specify the parameters that should be used to detect variants.
In the last wizard step you can check the selected settings by clicking on the button labeled Preview All Parameters. In the Preview All Parameters wizard you can only check the settings, and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows.
Choose to Save your results and click Finish.

Output from the Identify Variants (WGS) workflow

The Identify Variants (WGS) tool produces the following outputs:

Read Mapping () The mapped sequencing reads. The reads are shown in different colors depending on their orientation, whether they are single reads or paired reads, and whether they map unambiguously (see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Coloring_mapped_reads.html).
Read Mapping Report () The report consists of a number of tables and graphs that in different ways provide information about the mapped reads.
Two variant tracks (): The Identified Variants track containing the variants identified by the Low Frequency Variant Detection tool after the post-filtering has been applied, and the Indels indirect evidence track which contains the indels inferred by the Structural Variant Caller. When holding the mouse over the detected variants in the Track List, a tooltip appears with information about the individual variants. You will have to zoom in on the variants to be able to see the detailed tooltip.
Genome Browser View () A Genome Browser view containing the collection of tracks presented together. Shows the annotated variant track together with the human reference sequence, genes, transcripts, coding regions, the mapped reads, the identified variants, and the Indels indirect evidence variants (see figure 22.5).

Before looking at the identified variants, we recommend that you first take a look at the mapping report to see the performance of the mapping. E.g., check that at least 90% of the reads map to the human reference sequence.

Next, open the Genome Browser View (see figure 22.27). It lists the track of the identified variants in context to the human reference sequence, genes, transcripts, coding regions, and mapped sequencing reads.

Image identify_variants_result1_wgs
Figure 22.27: The Genome Browser View allows easy inspection of the identified variants in the context of the human genome.

By double-clicking on the Indels indirect evidence variant track in the Genome Browser View, a table will be shown that lists all inferred larger insertions and deletions (see figure 22.28).

Image identify_variants_result2_wgs
Figure 22.28: This figure shows a Genome Browser View with an open track table. The table allows deeper inspection of the identified variants.

Browse the manual

Identify Variants (WGS)

Run the Identify Variants (WGS) workflow

Output from the Identify Variants (WGS) workflow