Create Consensus Sequences from Variants

Using the Create Consensus Sequences from Variants tool, consensus sequences can be created from a Variant track () and the matching reference genome ().

It is possible to mask out low coverage regions () with N's when a coverage track is provided. To create a low coverage region track use the Create Mapping Graph () tool, see Create Mapping Graph, to identify coverage in your sample followed by filtering on specified frequency using the Identify Graph Threshold Areas () tool, see Identify Graph Threshold Areas.

A number of variant filtering options defining criteria for inclusion in the consensus sequence are available:

Minimum frequency for inclusion Include variants above a selected frequency in the consensus.
Ambiguity threshold Mask out positions where the most commonly observed nucleotide is seen in fewer than the specified fraction of reads.
Ignore frameshift variants Filter out Indels of size 1,2,4,5 etc. A minimum frequency can be specified for when to include a variant.

For variants remaining after filtering, the following rules apply for sites where variants overlap:

If multiple SNVs overlap, the relevant IUPAC ambiguity code is inserted in the consensus sequence. For example, if a site has the variants G and A, the ambiguity code R is inserted.
If a deletion overlaps an insertion or an SNV, only the variant with the highest frequency is included in the consensus sequence. In the case where the frequency is the same, only the deletion is included.

Running the Create Consensus Sequences from Variants tool

To run the Create Consensus Sequences from Variants tool, go to:

Tools | Resequencing Analysis () | Create Consensus Sequences from Variants ()

In the first step, select the variant track that contains the variants from which the consensus should be created.

The next step provides options for how to construct the consensus sequences (see figure 33.32).

Image consensus_wizard
Figure 33.32: Options available to configure when running the Create Consensus Sequences from Variants tool.

In the Tracks parameters, specify the reference genome and optionally provide an annotation track containing low coverage regions.

Variant handling parameters include:

Minimum frequency for inclusion ranging from 0.0 to 1.0 (Default 0.8).
Ambiguity threshold for masking variants with N's range from 0.0 to minimum frequency for inclusion (Default is 0.5). When generating the consensus, positions with variants that have a frequency between the ambiguity threshold and the minimum frequency for inclusion are replaced with N's. Positions with variants below the ambiguity threshold use the nucleotide from the reference. Note: only none-Indel variants can be masked using this threshold.
Ignore frameshift variants. Enabled when ticked, otherwise a frequency can be specified in order to include frameshift at a certain frequency (default is 1.0). Note, frameshift variants are here defined to cover Indels not fitting with a 3 codon base structure in its simplest form. This feature is especially relevant for Virus consensus creation where frameshift variants are unlikely.

In the final step, specify the output type and a save data location.

The tool offers two types of consensus sequence format:

Sequence Track A Consensus Genome Track.
Sequence List A Consensus Sequence list.

Browse the manual

Create Consensus Sequences from Variants

Running the Create Consensus Sequences from Variants tool