Filter Somatic Variants (WES)

If you are analyzing a list of variants that have been detected in a tumor or blood sample where no control sample is available from the same subject, you can use the Filter Somatic Variants (WES) ready-to-use workflow to identify potential somatic variants. The purpose of this ready-to-use workflow is to use publicly available (or your own) databases, with common variants in a population, to extract potential somatic variants whenever no control/normal sample from the same subject is available.

This workflow accepts variant tracks (Image variant_track_16_n_p) (e.g. the output from the Identify Variants ready-to-use workflow) as input. In cases with heterozygous variants, the reference allele is first filtered away, then variants outside the targeted region are removed, and lastly, variants found in the Common dbSNP, 1000 Genomes Project, and HapMap databases are deleted. Variants in those databases are assumed to not contain relevant somatic variants.

Please note that this tool will likely also remove inherited cancer variants that are present at a low percentage in a population.

Next, the remaining somatic variants are annotated with gene names, amino acid changes, conservation scores and information from ClinVar (known variants with medical impact) and dbSNP (all known variants).

Run the Filter Somatic Variants (WES) workflow

To run the Filter Somatic Variants (WES) tool, go to:

        Toolbox | Ready-to-Use Workflows | Whole Exome Sequencing (Image exome_sequencing_closed_16_n_p) | Somatic Cancer (Image somatic_folder_closed_16_n_p) | Filter Somatic Variants (Image filter_somatic_var_wgs_16_n_p)

  1. Double-click on the Filter Somatic Variants tool to start the analysis. If you are connected to a server, you will first be asked where you would like to run the analysis.

  2. Next, you will be asked to select the variant track you would like to use for filtering somatic variants (figure 14.15).

    Image filter_somatic_variants_step2_wes
    Figure 14.15: Select the variant track from which you would like to filter somatic variants.

  3. In the next dialog, you have to select which data set should be used to filter somatic variants (figure 14.16).

    Image filter_somatic_variants_wes
    Figure 14.16: Choose the relevant reference Data Set to annotate.

  4. In the next step you will be asked to specify which of the 1000 Genomes populations should be used for annotation (figure 14.17).

    Image filter_somatic_variants_step3_wes
    Figure 14.17: Specify which 1000 Genomes population to use for annotation.

  5. The next wizard step will once again allow you to specify the 1000 Genomes population that should be used, this time for filtering out variants found in the 1000 Genomes project.

  6. Finally, the next wizard step (figure 14.18) concerns removal of variants found in the HapMap database. Select the population you would like to use from the drop-down list. Please note that the populations available from the drop-down list can be specified with the Reference Data Manager.

    Image filter_somatic_variants_step6_wes
    Figure 14.18: Specify which HapMap population to use for filtering out known variants.

  7. In the last wizard step you can check the selected settings by clicking on the button labeled Preview All Parameters. In the Preview All Parameters wizard you can only check the settings, and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows.

  8. Choose to Save your results and click Finish.

Output from the Filter Somatic Variants (WES) workflow

Two types of output are generated:

Image filter_somatic_variants_genome_browser_view1_wes
Figure 14.19: The Track List showing the annotated somatic variants together with a range of other tracks.

To see the level of nucleotide conservation (from a multiple alignment with many vertebrates) in the region around each variant, a track with conservation scores is added as well. Mapped sequencing reads as well as other tracks can be easily added to this Track List. Open the variant track as a table showing all variants and the added information/annotations (see figure 14.20).

Image filter_somatic_variants_genome_browser_view2_wes
Figure 14.20: The Track List showing the annotated somatic variants together with a range of other tracks.

Adding information from other sources may help you identify interesting candidate variants for further research. E.g. common genetic variants (present in the HapMap database) or variants known to play a role in drug response or other relevant phenotypes (present in the ClinVar database) can easily be identified. Further, variants not found in the ClinVar database, can be prioritized based on amino acid changes in case the variant causes changes on the amino acid level.

A high conservation level, between different vertebrates or mammals, in the region containing the variant, can also be used to give a hint about whether a given variant is found in a region with an important functional role. If you would like to use the conservation scores to identify interesting variants, we recommend that variants with a conservation score of more than 0.9 (PhastCons score) is prioritized over variants with lower conservation scores.

It is possible to filter variants based on their annotations. This type of filtering can be facilitated using the table filter found at the top part of the table. If you are performing multiple experiments where you would like to use the exact same filter criteria, you can create a filter that can be saved and reused. To do this, use the following tool:

        Toolbox | Resequencing Analysis (Image resequencing) | Variant Filtering (Image variant_filtering_folder_closed_16_h_p) | Filter Variants on Custom Criteria (Image identify_candidate_variants_16_n_p)