Type with MLST Scheme
The Type with MLST Scheme tool is used for assigning a sequence type to an isolate.
Type with MLST Scheme is available from:
Tools | Microbial Genomics Module (
) | Typing and Epidemiology (
) | MLST Typing (
) | Type with MLST Scheme (
)
The tool accepts a sequence list as input and can process either raw NGS reads or an assembled genome. If the input is raw NGS reads, and the tool reports multiple ambiguous sequence types, performing a standard de novo assembly may help to reduce noise and yield a more conclusive typing result.
Some MLST Schemes contain alleles with ambiguous bases. Type with MLST Scheme does not support ambiguous bases and such alleles will effectively be ignored.
The tool works by comparing the kmers in the input to the kmers in the alleles for the different loci.
Figure 10.12: Specifying scheme and typing parameters.
In the wizard step "Typing parameters" (figure 10.12), specify the scheme and the typing parameters. MLST Schemes can be obtained as described in Getting started with the MLST Typing tools.
The typing parameters are:
- Kmer size. Determines the number of nucleotides in the kmer - raising this setting might increase specificity at the cost of some sensitivity.
- Minimum locus presence. Specifies the proportion of loci that must be detected in a sample for the analysis to be considered valid. Both known and novel alleles are included in this calculation. The appropriate value depends on the MLST Scheme being used, as different schemes vary in their tolerance for missing loci. This parameter applies only to cgMLST and wgMLST schemes. For traditional 7-locus MLST schemes, all loci are always required.
- Comparing a known to a missing allele. Defines how the tool handles missing alleles when comparing the input sample's profile to sequence types in the MLST Scheme. If set to "Counted as same allele", a missing locus is treated as matching the known allele, so it does not increase the allelic distance. If set to "Counted as different alleles", a missing locus is treated as non-matching, adding one to the allelic distance. This choice affects similarity scoring and determines which sequence types appear as closest matches when typing is conclusive.
Figure 10.13: Specifying novel allele detection parameters.
In the wizard step "Novel allele detection parameters", specify how to handle novel alleles (figure 10.13). If the input isolate has loci with alleles that are not part of the scheme, it is possible to still detect the novel alleles. The novel alleles and the resulting new sequence type can then be added to the scheme using the Add Typing Results to MLST Scheme tool. Novel alleles are detected as close hits to existing alleles in a locus.
To search for novel alleles, tick the box of the same name, then set the parameters:
- Minimum required fraction of kmers. Determines how close a match must be: the default setting of 0.9 means that at least 90% of the kmers for an allele in a locus must be identified before the novel allele detection is initiated.
- Minimum length. The minimum length a detected allele must have, for it to be accepted as a novel allele.
- Minimum length fraction. The minimum length fraction a detected allele must have, compared to the shortest known allele in the given locus, for it to be accepted as a novel allele.
If the input to the tool is raw NGS reads, the tool will assemble the reads containing the kmers for the possible novel allele. If the input is already an assembled genome, the existing alleles for a locus will be mapped to the assembly to extract a novel allele.
After a candidate novel allele has been identified, it is aligned to the other alleles in the locus.
If the scheme has been built with the Check codon positions option of the Create MLST Scheme tool enabled, or if the scheme was imported with a specified genetic code using Import MLST Scheme, the start and stop codons in the novel allele sequence are identified, and the sequence is trimmed to the start and stop codons that most closely match the length of the existing alleles in the locus.
Alleles that contain both a start and a stop codon at the beginning and end, respectively, and pass the length parameters, will be marked as "Complete" in the output table from the tool.
Subsections
