Type With MLST Scheme

The Type With MLST Scheme tool is used for assigning a sequence type to an isolate.

To run the Type With MLST Scheme tool choose:

        Toolbox | Microbial Genomics Module (Image mgm_folder_closed_flat_16_h_p) | Typing and Epidemiology (Image typing_epi_folder_closed_16_h_p) | MLST Typing (Image large_mlst_open_16_h_p) | Type With MLST Scheme (Image type_w_large_mlst_16_n_p)

The tool takes a sequence list as input and will work with either raw NGS reads or an assembled genome. Note that if the input is raw NGS reads, and the tool reports multiple ambiguous sequence types, performing a standard De Novo Assembly might help to reduce noise and provide a more conclusive typing result.

Image mlst_type_step1
Figure 12.12: Specifying scheme and typing parameters.

In the next dialog step (figure 12.12), specify the scheme and the typing parameters.

The tool works by comparing the kmers in the input to the kmers in the alleles for the different loci.

The Kmer size determines the number of nucleotides in the kmer - raising this setting might increase specificity at the cost of some sensitivity.

The Typing threshold determines how many of the kmers in a sequence type that needs to be identified before a typing is considered conclusive. The default setting of 1.0 means that all kmers in all alleles must be matched. Lowering the setting to 0.99 would mean that on average 99% of the kmers in all the alleles of a given sequence type must be detected before the sequence type is considered conclusive.

When working with reads, the Type With MLST Scheme tool works by classifying allele calls as high-confidence and low-confidence calls to remove alternative allele calls for the same locus. The Minimum kmer ratio threshold gives the possibility to tweak the balance between high-confidence and low-confidence allele calls, e.g. decreasing this number will result in more high-confidence allele calls and thus more ambiguity in how an ST is assigned to the sample, conversely increasing this number will result in fewer high-confidence calls and may lead to no allele being called for a particular locus, which can make sequence type assignments less confident. Specifically, the kmer ratio is calculated as the number of observations for the least occurring kmer in an allele divided by the average number of observations for all kmers.

Image mlst_type_step2
Figure 12.13: Specifying novel allele detection parameters.

The next step in the dialog determines how to handle novel alleles (figure 12.13): if the input isolate has loci with alleles that are not part of the scheme, it is possible to still detect the novel alleles. The novel alleles and the resulting new sequence type can then be added to the scheme using the Add Typing Results to MLST Scheme tool.

Novel alleles are detected as close hits to existing alleles in a locus. The Minimum required fraction of kmers determines how close a match must be: the default setting of 0.9 means that at least 90% of the kmers for an allele in a locus must be identified before the novel allele detection is initiated.

If the input to the tool is raw NGS reads, the tool will assemble the reads containing the kmers for the possible novel allele. If the input is already an assembled genome, the existing alleles for a locus will be mapped to the assembly to extract a novel allele.

After a candidate novel allele has been identified, it is aligned to the other alleles in the locus.

If the scheme has been built with the Check codon positions option of the Create MLST Scheme tool enabled (see Create MLST Scheme)), or if the scheme was imported with a specified genetic code (see Download MLST Scheme), the start and stop codons in the novel allele sequence are then identified, and the sequence is then trimmed to the start and stop codons that most closely match the length of the existing alleles in the locus. Alleles that contain both a start and a stop codon at the beginning and end, respectively, and pass the acceptance parameters (see below) will be marked as Complete in the output table from the tool.

The acceptance parameters describe the final consistency check: the novel allele must not contain a stop codon, it must be at least the Minimum length in nucleotides and have at least a length of the specified Minimum length fraction of the shortest allele in the locus before it is accepted.