Find Best Matches using K-mer Spectra

The Find Best Matches using K-mer Spectra tool is inspired by [Hasman et al., 2013] and [Larsen et al., 2014] and enables identification of the best matching reference among a specified reference sequence list. This section intends to describe the tool if you would like to use it as such. However, we recommend to use the Type a Known Species or Type among Multiple Species template workflows instead as described in the chapter Workflow templates for typing and epidemiology.

To identify best matching bacterial genome reference, go to:

        Toolbox | Microbial Genomics Module (Image mgm_folder_closed_flat_16_h_p) | Typing and Epidemiology (Image typing_epi_folder_closed_16_h_p) | Find Best Matches using K-mer Spectra (Image find_best_kmer_spectra_16_h_p)

Select the sequences you want want to find a best match sequence for (figure 10.1).

Image find_best_match1
Figure 10.1: To identify best matching reference, specification of read file is the first step.

Select then a reference database, and specify the following settings (figure 10.2).

Image find_best_match2
Figure 10.2: Specify reference list to search across.

In the last wizard window, the tool provides the following output options (figure 10.3).

Image find_best_match4
Figure 10.3: Choose your output option before saving your results.

In cases where the tool stops with a warning that good references were not found, you should download a new set of references for the organisms of interest and re-run the workflow.

To add the obtained best match to a Result Metadata Table, see the section Extend Result Metadata Table. Note that Best match results are added automatically to Result Metadata Table when using the template Type a Known Species and/or Type among Multiple Species workflow(s) or their customized versions.

Note that in rare instances, the lists of references found in the Output Best Matching Sequences as a List and Output Quality Report may differ. The reason is that the former list is compiled based on a "Winner takes all" based count of K-mers which attributes all uniquely found K-mers only to the reference with the highest Z-score,. The latter list however is produced by removing all reads mapping to the best matching reference and using the remaining reads as a basis for determining the next best match. Thus, in the second round the pool of K-mers has been altered, and some K-mers that determined the Z-score of the original second-best match may have been removed.

Once results from the Find Best Matches using K-mer Spectra tool are added to the Result Metadata Table, extra columns are present in the table, including the taxonomy of the best matching references. In addition, in case the quality control was activated, the table will include the percentage of reads mapping to the best reference and the most probable contaminating species (see figure 10.4).

Image find_best_match3
Figure 10.4: Taxonomy of the best matching reference and quality information is shown in the Metadata Result Table.