Identify Variants (TAS)

The "Identify Variants" tool takes sequencing reads as input and returns identified variants as part of a Genome Browser View.

The tool runs an internal workflow, which starts with mapping the sequencing reads to the human reference sequence. Then it runs a local realignment to improve the variant detection, which is run afterwards. At the end, variants with an average base quality smaller than 20 are filtered away.

In addition, a targeted region report is created to inspect the overall coverage and mapping specificity in the targeted regions.

Import your targeted regions

A file with the genomic regions targeted by the amplicon or hybridization kit will be provided by the vendor. To obtain this file you will have to get in contact with the vendor and ask them to send this target regions file to you. You will get it in either .bed or .gff format.

Please use the Tracks import as part of the Import tool in the toolbar to import your file into the Cancer Research Workbench.

How to run the "Identify Variants" ready-to-use workflow

  1. Go to the toolbox and double-click on the "Identify Variants" ready-to-use workflow (figure 14.1).

    Image run_identify_variants_tas
    Figure 14.1: The ready-to-use workflows are found in the toolbox.

    This will open the wizard shown in figure 14.2 where you can select the sequencing reads from the sample, which should be analyzed.

    Image identify_variants_wizardstep1_tas
    Figure 14.2: Please select all sequencing reads from the sample to be analyzed.

    Please select all sequencing reads from your sample. If several samples should be analyzed, the tool has to be run in batch mode. This is done by selecting "Batch" (tick "Batch" at the bottom of the wizard as shown in figure 14.43) and select the folder that holds the data you wish to analyze. If you have your sequencing data in separate folders, you should choose to run the analysis in batch mode.

    When you have selected the sample(s) you wish to prepare, click on the button labeled Next.

  2. In the next wizard step (figure 14.3) you have to specify the track with the targeted regions from the experiment. You can also specify the minimum read coverage, which should be present in the targeted regions.

    Image identify_variants_wizardstep2_tas
    Figure 14.3: Select the track with the targeted regions from your experiment.

  3. Click on the button labeled Next, which will take you to the next wizard step (figure 14.4). In this wizard you can specify the parameter for detecting variants.

    Image identify_variants_wizardstep3_tas
    Figure 14.4: Please specify the parameters for variant detection.

  4. Click on the button labeled Next, which will take you to the next wizard step (figure 14.5).

    Image identify_variants_wizardstep4_tas
    Figure 14.5: Select the targeted region track. Variants found outside the targeted region will be removed.

  5. Click on the button labeled Next to go to the last wizard step (figure 14.6).

    Image identify_variants_wizardstep5_tas
    Figure 14.6: Choose to save the results. In this wizard step you get the chance to preview the settings used in the ready-to-use workflow.

    In this wizard you get the chance to check the selected settings by clicking on the button labeled Preview All Parameters. In the Preview All Parameters wizard step you can only check the settings, it is not possible to make any changes at this point. At the bottom of this wizard there are two buttons regarding export functions; one button allows specification of the export format, and the other button (the one labeled "Export Parameters") allows specification of the export destination. When selecting an export location, you will export the analysis parameter settings that were specified for this specific experiment.

  6. Click on the button labeled OK to go back to the previous wizard step and choose Save.

    Note! If you choose to open the results, the results will not be saved automatically. You can always save the results at a later point.

Output from the Identify Variants workflow

The "Identify Variants" tool produces six different types of output:

  1. Read Mapping (Image read_track_16_n_p) The mapped sequencing reads. The reads are shown in different colors depending on their orientation, whether they are single reads or paired reads, and whether they map unambiguously. For the color codes please see the description of sequence colors in the CLC Genomics Workbench manual that can be found here: http://www.clcsupport.com/clcgenomicsworkbench/current/index.php?manual=View_settings_in_Side_Panel.html.
  2. Target Regions Coverage (Image annotation_track_16_n_p) The target regions coverage track shows the coverage of the targeted regions. Detailed information about coverage and read count can be found in the table format, which can be opened by pressing the table icon found in the lower left corner of the View Area.
  3. Target Regions Coverage Report (Image proteinreport_16_n_p) The report consists of a number of tables and graphs that in different ways provide information about the targeted regions.
  4. Identified Variants (Image variant_track_16_n_p) A variant track holding the identified variants. The variants can be shown in track format or in table format. When holding the mouse over the detected variants in the Genome Browser view a tooltip appears with information about the individual variants. You will have to zoom in on the variants to be able to see the detailed tooltip.
  5. Genome Browser View Identify Variants (Image trackset_16_n_p) A collection of tracks presented together. Shows the annotated variants track together with the human reference sequence, genes, transcripts, coding regions, the mapped reads, the identified variants, and the structural variants (see figure 14.12).

It is important that you do not delete any of the produced files individually as some of the outputs are linked to other outputs. If you would like to delete the outputs, please always delete all of them at the same time.

Please have first a look at the mapping report to see if the coverage is sufficient in regions of interest (e.g. > 30 ). Furthermore, please check that at least 90% of reads are mapped to the human reference sequence. In case of a targeted experiment, please also check that the majority of reads are mapping to the targeted region.

Afterwards please open the Genome Browser View file (see  14.7).

The Genome Browser View includes the track of identified variants in context to the human reference sequence, genes, transcripts, coding regions, targeted regions and mapped sequencing reads.

Image identify_variants_result1_tas
Figure 14.7: The Genome Browser View allows you to inspect the identified variants in the context of the human genome.

By double clicking on the variant track in the Genome Browser View, a table will be shown which includes information about all identified variants (see 14.8).

Image identify_variants_result2_tas
Figure 14.8: Genome Browser View with an open track table to inspect identified variants more closely in the context of the human genome.

In case you like to change the reference sequence used for mapping as well as the human genes, please use the "Data Management".