Import MLST Scheme

To run the Import MLST Scheme tool choose:

        Toolbox | Microbial Genomics Module (Image mgm_folder_closed_flat_16_h_p) | Databases (Image typing_epi_folder_closed_16_h_p) | MLST Typing (Image import_large_mlst_16_h_p) | Import MLST Scheme (Image import_large_mlst_16_h_p)

Image mlst_import_step1
Figure 13.9: The MLST Scheme import parameters.

The Allele folder (FASTA) must contain a set of FASTA files, one for each locus. The files must have one of the following extensions to be recognized: "fa", "fas", "fsa", "fasta", "tfa". The name of the allele must be the locus name and allele name separated by an underscore, like in this example:

		
		>pheS_1
		AGAGAAAAGAACGATACTTTCTATATGGCCCGTGATAATCAAGGCAAGCGTGTTGTCTTA
		>pheS_2
		AGAGAAAAGAACGATACTTTCTATATGGCCCGTGATAATCAAGGCAAGCGTGTTGTCTTA

The Sequence types (TSV) file must be a tab-separated file listing a sequence type and its alleles in the following format:

ST  pheS    glyA    fumC    mdh sucA    dnaN    atpA    clonal_complex
1   30  1   1   1   1   1   1   6
3   6   8   7   3   4   3   1  
4   7   9   8   3   5   2   1  
6   47  3   10  4   7   2   2
It is possible to add arbitrary metadata as additional columns after the loci columns (e.g. the 'clonal_complex' column above). If multiple isolates share the same sequence type, but have different metadata, it is possible to add multiple lines with the same sequence type name and allele ids, but with different metadata entries.

The Loci (TXT) file must be a tab-separated file listing a locus name and its corresponding metadata. For this file the only recognized headers are "Locus", "Known name", "Type name", "Locus type" where the name of the locus in the MLST scheme needs to match the name in the Locus column of the annotation file.

		
		Locus	Known name	Type name	Locus type
		locus5	FALSE	Unknown	ST1
		fliR	TRUE	fli	ST2
		flgL	TRUE	flg	ST3
		hpaB	TRUE	hpa	ST4

The Clustering parameters and Minimum Spanning Tree parameters are similar to the options for Download MLST Scheme tool (see Download MLST Scheme)

The genetic code specified will be used for novel allele detection to make sure each allele starts and ends with an initiation and stop codon, respectively. If "No code specified" is selected, these requirements will not be checked when searching for novel alleles, instead the aligned part of the existing alleles to the assembly is used to define the allelic length. Note that the latter is useful for 7-gene MLST schemes which generally use fractions of genes, but it is also sensitive towards unaligned ends and may return too short alleles in some cases.