This section will use the filter tool as an example, since the core of the tools are the same:
Toolbox | Resequencing () | Annotate and Filter | Filter against Known Variants
This opens a dialog where you can select a variant track () with experimental data that should be filtered.
Clicking Next will display the dialog shown in figure 26.20
Select () one or more tracks of known variants to compare against. The tool will then compare each of the variants provided in the input track to see if it is reported in the track of known variants. There are three modes of filtering:
- Keep variants found among known variants
- This will filter away all variants that are not found in the track of known variants. This mode can be useful for filtering against tracks with known disease-causing mutations (e.g. COSMIC), where the result will only include the variants that match the known mutations. For SNVs, the criteria for matching are simple: the variant position and allele both have to be identical in the input and the known variants track. For insertions and deletions, it is taken into account that they cannot always be placed unambiguously. As an example,
AA->Acan be a deletion of either the first or the second A, and both will be recognized as a match. For each variant found, the result track will include information from the known variant.
- Keep variants overlapping with known variants
- The first mode is based on exact matching of the variants. This means that if the allele is reported differently in the set of known variants, it will not be identified as a known variant. This is typically not the case with isolated SNVs, but for more complex variants it can be a problem. Instead of requiring a strict match, this mode will keep variants that overlap with a variant in the set of known variants. This is a more conservative approach and will allow you to inspect the annotations on the variants instead of removing them when they do not match. For each variant, the result track will include information about overlapping or strictly matched variants to allow for more detailed exploration.
- Keep variants not found among known variants
- This mode can be used for filtering away common variants if they are not of interest. For example, you can download a variant track from 1000 genomes and use that for filtering away common variants. This mode is based on exact match. If you wish to filter based on overlap, please use the Filter against overlapping annotations tool.
The option to Keep linked variants comes into play for variants that are linked (see Linking adjacent variants in linkage groups). As an example, you may have a variant like this
AC->GT. This is reported in the variant track as two separate variants in the same linkage group. If just one of the variants are found among the known variants, they will both be retained if the option to keep linked variants is checked. If the option is unchecked, it means that the linkage group in this situation will be broken and one of the variants will be removed.
- ...sec:downloadreferencegenome). 26.2
- Please note that there is also a plug-in for annotating with data from HGMD and other databases via Biobase Genome Trax: http://www.clcbio.com/clc-plugin/biobase-genome-trax/