Find Best Matches using K-mer Spectra

The Find Best Matches using K-mer Spectra tool is inspired by [Hasman et al., 2013] and [Larsen et al., 2014] and enables identification of the best matching reference among a specified reference sequence list.

To identify a best matching bacterial genome reference, go to:

        Tools | Microbial Genomics Module (Image mgm_folder_closed_flat_16_h_p) | Typing and Epidemiology (Image typing_epi_folder_closed_16_h_p) | Find Best Matches using K-mer Spectra (Image find_best_kmer_spectra_16_h_p)

Select the sequences for which you want to find a best match (figure 8.1).

Image find_best_match1
Figure 8.1: To identify a best matching reference, first select the read file(s).

Then specify the following settings (figure 8.2).

Image find_best_match2
Figure 8.2: Specify the reference list(s) to search across.

In the last wizard window, the tool provides the following output options (figure 8.3).

Image find_best_match4
Figure 8.3: Choose your output option before saving your results.

In cases where the tool stops with a warning that good references were not found, you should download a new set of references for the organisms of interest and re-run the workflow.

To add the obtained best match to a Result Metadata Table, see Extend Result Metadata Table.

Note that in rare instances, the lists of references found in the Output Best Matching Sequences as a List and Output Quality Report may differ. The reason is that the former list is compiled based on a "Winner takes all" based count of K-mers, which attributes all uniquely found K-mers only to the reference with the highest Z-score. The latter list is produced by removing all reads mapping to the best matching reference and using the remaining reads as a basis for determining the next best match. Thus, in the second round the pool of K-mers has been altered, and some K-mers that determined the Z-score of the original second-best match may have been removed.

Once results from the Find Best Matches using K-mer Spectra tool are added to the Result Metadata Table, extra columns are present in the table, including the taxonomy of the best matching references. In addition, if Check for low quality and contamination was selected, the table will include the percentage of reads mapping to the best reference and the most probable contaminating species (see figure 8.4). If Check for low quality and contamination is not selected, the table will still include the percentage of reads mapping to the best reference if either Output quality report or Output read mapping to best match is selected.

Image find_best_match3
Figure 8.4: Taxonomy of the best matching reference and quality information is shown in the Metadata Result Table.



Subsections