Detect Amplicon Sequence Variants

The Detect Amplicon Sequence Variants tool infers sequence variants from amplicon data. The tool uses error profiling to distinguish biological nucleotide differences from sequencing errors, making it possible to resolve amplicon sequence variants (ASVs) down to the level of single nucleotide differences. The algorithm is inspired by DADA2, [Callahan et al., 2016].

The Detect Amplicon Sequence Variants analysis includes the following steps:

Initial filtering and length trimming ensures that reads are of the same length and optimized for the subsequent analysis:
- Length trimming Reads are trimmed from the 3' end to the user-defined length. Reads shorter than this are removed.
- Ambiguity filter Reads containing ambiguous bases are discarded.
- Expected Errors filter Reads with more expected errors than the user-defined threshold are discarded.
Dereplication Produces an intermediate list of unique sequences.
Denoising This iterative process estimates a sample-specific error model. This error model is then used to distinguish biological nucleotide differences from likely sequencing errors and generate the list of candidate amplicon sequence variants.
Remove chimeras Sequences that are assessed as being chimeras are discarded.
Merging unique read pairs For paired read dataset, unique read pairs are merged. Pairs with insufficient overlap (<12 bases), are discarded.

A template workflow with a proposed analysis pipeline - trimming reads, detecting amplicon sequence variants, merging ASV tables, and assigning taxonomies - is available at:

For more information, see Detect Amplicon Sequence Variants and Assign Taxonomies workflow.

Subsections

Browse the manual

Detect Amplicon Sequence Variants