Annotate with DIAMOND

The Annotate with DIAMOND tool allows you to annotate a DNA sequence using a set of known protein reference sequences. This tool can be used on sequences without any pre-existing annotations: it is not necessary to annotate the DNA sequences with genes or coding regions. For more information about the DIAMOND aligner, see Annotate CDS with Best DIAMOND Hit.

The tools can be used for various purposes, e.g. transferring annotations from a known reference, annotate the presence of AMR or virulence markers in a genome, or to filter contigs or sequences based on the presence of a set of genes.

For annotating DNA sequences from a set of non-coding reference sequences, the Annotate with BLAST tool may be used instead. However, the Annotate with DIAMOND tool is in general the fastest option when working with coding regions.

If the input sequences are already annotated with CDS annotations, it is also possible to use the Annotate CDS with Best BLAST Hit and Annotate CDS with Best DIAMOND Hit tools - see Annotate CDS with Best BLAST Hit for more information.

To start the tool, go to:

        Toolbox | Microbial Genomics Module (Image mgm_folder_closed_flat_16_h_p) | Functional Analysis (Image functional_analysis_folder_closed_16_n_p) | Annotate with DIAMOND (Image diamond_annotate_16_n_p)

The first wizard step (figure 11.5), specifies the reference and search parameters.

Image annotate_diamond_step_1
Figure 11.5: Selecting references and specifying search parameters.

The following sources can be used to annotate the input sequences:

As can be seen above, metadata (such as GO terms and taxonomy information) is handled differently depending on the database source:

The search parameters can be modified using the following settings:

Adjustment can be made to the annotation hits by the following setting:

The next step (figure 11.6), determines how to handle when multiple overlapping hits are found on the input query sequence.

Image annotate_diamond_step_2
Figure 11.6: Settings for handling overlapping hits.

The following options are available:

Best hits are determined by:

The output options step (figure 11.7), has the following options:

Image annotate_diamond_step_3
Figure 11.7: Specifying output options.

The following sequence output options are available:

The final step controls which outputs are created. Notice, that reports can be aggregated using the Combine Reports tool.