Extract consensus sequence

For all kinds of read mappings, including those generated from de novo assembly or RNA-seq analyses, a consensus sequence can be extracted. In addition, you can extract a consensus sequence from a BLAST result as well. The consensus sequence extraction tool can be run in batch and as part of workflows.

To start the tool:

        Toolbox | NGS Core Tools (Image ngsfolder) | Extract Consensus Sequence (Image extract_consensus)

This opens a dialog where you can select mappings,either in the form of tracks or read mappings, or BLAST results. Click Next to specify how the consensus sequence should be created (see figure 25.34).

Image extract_consensus_step2
Figure 25.34: Specifying how the consensus sequence should be extracted.

It is also possible to extract a consensus sequence from a mapping view by right-clicking the name of the consensus or reference sequence or a selection on the reference sequence and select Extract Consensus Sequence (Image extract_consensus).

When extracting a consensus sequence, you can decide how to handle regions with low coverage (a definition of coverage can be found in Reference sequence statistics). The first step is to define a threshold for when coverage is considered low. The default value is 0, which means that low coverage is defined as no coverage (i.e. no reads align to the reference at this position). That means if you have one read covering a given position, it will only be that read that determines the consensus sequence. If you need to place higher confidence that the consensus sequence is correct, we advice to raise this value, to only construct a consensus sequence when there are more reads supporting it.

When the low coverage threshold is defined, there are several options for handling the low coverage regions:

In addition to deciding how to handle low coverage regions, you can also decide how to handle conflicts or disagreement between the reads:

Click Next to set the output option as shown in figure 25.35).

Image extract_consensus_step3
Figure 25.35: Choose to add annotations to the consensus sequence.

The annotations that can be added to the consensus sequence produced by this tool show both conflicts that have been resolved and low coverage regions (unless you have chosen to split the consensus sequence). Please note that for large data sets, this can amount to a very high number of annotations which will cause the tool to take longer time to complete, and the result will take up much more disk space.

Click Next if you wish to adjust how to handle the results. If not, click Finish. .