Create Consensus Sequences from Variants

Using the Create Consensus Sequences from Variants tool, consensus sequences can be created from a Variant track (Image variant_track_16_n_p) and the matching reference genome (Image sequence_dna). In addition, it is an option to mask out low coverage regions (Image annotation_track_16_n_p) with N's when a coverage track is provided.

Note: To create a low coverage region track use the Create Mapping Graph (Image graph_track_16_n_p) tool to identify coverage in your sample followed by filtering on specified frequency using the Identify Graph Threshold Areas (Image resequencing) tool.

The Create Consensus Sequences from Variants tool offers a number of filtering options:

Running the Create Consensus Sequences from Variants tool

To run the Create Consensus Sequences from Variants tool, go to:

        Toolbox | Resequencing Analysis (Image resequencing) | Create Consensus Sequences from Variants (Image consensus_from_variants_16_h_p)

In the first step, select the variant track that contains the variants from which the consensus should be created.

The next step provides options for how to construct the consensus sequences (see figure 32.37).

Image consensus_wizard
Figure 32.37: Options available to configure when running the Create Consensus Sequences from Variants tool.

In the Tracks parameters, specify the reference genome and optionally provide an annotation track containing low coverage regions.

Variant handling parameters include:

In the final step, specify the output type and a save data location.

The tool offers two types of consensus sequence format:

To get an overview of the variants and masking in the consensus sequences an option for smaller genomes is to map the Consensus Sequence List against the reference sequence using Map Reads to Reference (Image read_mapping_16_n_p) and look at the Sample mapping and table. An example for the SARS-CoV-2 (MN908947.3) consensus and reference is given in figure 32.38 and 32.39).

Image consensus_mapping
Figure 32.38: Alignment of a mapped SARS-CoV-2 consensus against MN908947.3 reference genome.

Image consensus_diff
Figure 32.39: Variant representation of the alignment shown in figure 32.38

The Map Long Read to Reference (Image map_long_to_reference_16_h_p) tool should be used for bigger genomes, with the limitation that inspection of the mapping using track view can be slow for consensus sequences longer than 100.000 base pairs. Map Long Read to Reference (Image map_long_to_reference_16_h_p) is part of the Long Read Support Plugin and can be downloaded from the Plugins Manager (see http://resources.qiagenbioinformatics.com/manuals/longreadsupport/current/index.php?manual=Map_Long_Reads_Reference.html for further details).