Find Best Matches using K-mer Spectra

The Find Best Matches using K-mer Spectra tool is inspired by [Hasman et al., 2013] and [Larsen et al., 2014] and enables identification of the best matching reference among a specified reference sequence list.

Template workflows for typing and epidemiology analysis are available at:

        Toolbox | Template Workflows (Image workflow_group) | Microbial Workflows (Image mgm_folder_closed_flat_16_h_p) | Typing and Epidemiology (Image typing_epi_folder_closed_16_h_p)

For more information, see Typing and Epidemiology template workflows.

To identify best matching bacterial genome reference, go to:

        Toolbox | Microbial Genomics Module (Image mgm_folder_closed_flat_16_h_p) | Typing and Epidemiology (Image typing_epi_folder_closed_16_h_p) | Find Best Matches using K-mer Spectra (Image find_best_kmer_spectra_16_h_p)

Select the sequences you want want to find a best match sequence for (figure 7.1).

Image find_best_match1
Figure 7.1: To identify best matching reference, specification of read file is the first step.

Select then a reference database, and specify the following settings (figure 7.2).

Image find_best_match2
Figure 7.2: Specify reference list to search across.

In the last wizard window, the tool provides the following output options (figure 7.3).

Image find_best_match4
Figure 7.3: Choose your output option before saving your results.

In cases where the tool stops with a warning that good references were not found, you should download a new set of references for the organisms of interest and re-run the workflow.

To add the obtained best match to a Result Metadata Table, see Extend Result Metadata Table.

Note that in rare instances, the lists of references found in the Output Best Matching Sequences as a List and Output Quality Report may differ. The reason is that the former list is compiled based on a "Winner takes all" based count of K-mers which attributes all uniquely found K-mers only to the reference with the highest Z-score,. The latter list however is produced by removing all reads mapping to the best matching reference and using the remaining reads as a basis for determining the next best match. Thus, in the second round the pool of K-mers has been altered, and some K-mers that determined the Z-score of the original second-best match may have been removed.

Once results from the Find Best Matches using K-mer Spectra tool are added to the Result Metadata Table, extra columns are present in the table, including the taxonomy of the best matching references. In addition, in case the quality control was activated, the table will include the percentage of reads mapping to the best reference and the most probable contaminating species (see figure 7.4).

Image find_best_match3
Figure 7.4: Taxonomy of the best matching reference and quality information is shown in the Metadata Result Table.



Subsections