Create Consensus Sequences from Variants

Using the Create Consensus Sequences from Variants tool, consensus sequences can be created from a Variant track (Image variant_track_16_n_p) and the matching reference genome (Image sequence_dna).

It is possible to mask out low coverage regions (Image annotation_track_16_n_p) with N's when a coverage track is provided. To create a low coverage region track use the Create Mapping Graph (Image graph_track_16_n_p) tool, see Create Mapping Graph, to identify coverage in your sample followed by filtering on specified frequency using the Identify Graph Threshold Areas (Image resequencing) tool, see Identify Graph Threshold Areas.

A number of variant filtering options defining criteria for inclusion in the consensus sequence are available:

For variants remaining after filtering, the following rules apply for sites where variants overlap:

Running the Create Consensus Sequences from Variants tool

To run the Create Consensus Sequences from Variants tool, go to:

        Tools | Resequencing Analysis (Image resequencing) | Create Consensus Sequences from Variants (Image consensus_from_variants_16_h_p)

In the first step, select the variant track that contains the variants from which the consensus should be created.

The next step provides options for how to construct the consensus sequences (see figure 32.35).

Image consensus_wizard
Figure 32.35: Options available to configure when running the Create Consensus Sequences from Variants tool.

In the Tracks parameters, specify the reference genome and optionally provide an annotation track containing low coverage regions.

Variant handling parameters include:

In the final step, specify the output type and a save data location.

The tool offers two types of consensus sequence format:

To get an overview of the variants and masking in the consensus sequences an option for smaller genomes is to map the Consensus Sequence List against the reference sequence using Map Reads to Reference (Image read_mapping_16_n_p) and look at the Sample mapping and table. An example for the SARS-CoV-2 (MN908947.3) consensus and reference is given in figure 32.36 and 32.37).

Image consensus_mapping
Figure 32.36: Alignment of a mapped SARS-CoV-2 consensus against MN908947.3 reference genome.

Image consensus_diff
Figure 32.37: Variant representation of the alignment shown in figure 32.36

The Map Long Read to Reference (Image map_long_to_reference_16_h_p) tool should be used for bigger genomes, with the limitation that inspection of the mapping using track view can be slow for consensus sequences longer than 100.000 base pairs. Map Long Read to Reference (Image map_long_to_reference_16_h_p) is part of the Long Read Support Plugin and can be downloaded from the Plugins Manager (see https://resources.qiagenbioinformatics.com/manuals/longreadsupport/current/index.php?manual=Map_Long_Reads_Reference.html for further details).