Annotate with BLAST

The Annotate with BLAST tool allows you to annotate a DNA sequence using a set of either protein reference sequences or nucleotide sequences. This tool can be used on sequences without any pre-existing annotations: it is not necessary to annotate the DNA sequences with genes or coding regions.

The tools can be used for various purposes, e.g. transferring annotations from a known reference, annotate the presence of AMR or virulence markers in a genome, or to filter contigs or sequences based on the presence of a set of genes.

If the reference sequences are protein sequences, the Annotate with DIAMOND tool may be used instead and is a faster option.

If the input sequences are already annotated with CDS annotations, it is also possible to use the Annotate CDS with Best BLAST Hit and Annotate CDS with Best DIAMOND Hit tools - see Annotate CDS with Best BLAST Hit for more information.

To start the analysis, go to:

        Tools | Microbial Genomics Module (Image mgm_folder_closed_flat_16_h_p) | Functional Analysis (Image functional_analysis_folder_closed_16_n_p) | Annotate with BLAST (Image blast_annotate_16_n_p)

The first wizard step (figure 12.2), specifies the reference and search parameters.

Image annotate_blast_step_1
Figure 12.2: Selecting references and specifying search parameters

The following sources can be used to annotate the input sequences:

As can be seen above, metadata (such as GO terms and taxonomy information) is handled differently depending on the database source:

The search parameters can be modified using the following settings:

Adjustment can be made to the annotation hits by the following setting:

The next step (figure 12.3), determines how to handle when multiple overlapping hits are found on the input query sequence.

Image annotate_blast_step_2
Figure 12.3: Settings for handling overlapping hits

The following options are available:

Best hits are determined by:

The output options step (figure 12.4), has the following options:

Image annotate_blast_step_3
Figure 12.4: Specifying output options

The following sequence output options are available:

The final step controls which outputs are created. Notice, that reports can be aggregated using the Combine Reports tool.