Filter against Known Variants

The Filter against Known Variants tool filters experimental variants based on a known variant track to remove common variants.

Any variant track can be used as the "known variants" track. It may either be produced by the CLC Genomics Workbench, imported or downloaded from variant database resources like dbSNP, 1000 genomes, HapMap etc. (see Import tracks from file and Download Genomes).

To get started, go to:

        Toolbox | Resequencing Analysis (Image resequencing) | Variant Filtering (Image variant_filtering_folder_closed_16_h_p) | Filter against Known Variants (Image filter_database_variations_16_n_p)

This opens a dialog where you can select a variant track (Image variant_track_16_n_p) with experimental data that should be filtered.

Clicking Next will display the dialog shown in figure 29.1

Image filter_variant_db_step2
Figure 29.1: Specifying a variant track to filter against.

Select (Image browse) one or more tracks of known variants to compare against. The tool will then compare each of the variants provided in the input track with the variants in the track of known variants. The output will be a variant track where the remaining variants will depend on the mode of filtering chosen:

Since many databases do not report a succession of SNVs as one MNV, it is not possible to directly compare variants called with CLC Genomics Workbench with these databases. In order to support filtering against these databases anyway, the option to Join adjacent SNVs and MNVs can be enabled. This means that an MNV in the experimental data will get an exact match, if a set of SNVs and MNVs in the database can be combined to provide the same allele.

Note! This assumes that SNVs and MNVs in the track of known variants represent the same allele, although there is no evidence for this in the track of known variants.

This tool will create a new track where common variants have been removed. The annotations that are left are marked in three different ways:

Exact match
This means that the variant position and allele both have to be identical in the input and the known variants track (however, note the extra option for joining adjacent SNVs and MNVs described below).
Partial MNV match
This applies to MNVs which can be annotated with partial matches if an SNV or a shorter MNV in the database has an allele sequence that is contained in the allele sequence of the annotated MNV.
Overlap
This will report if the known variant track has an overlapping variant.