Type Among Multiple Species

The Type Among Multiple Species workflow is designed for typing a sample among multiple predefined species.

It allows for identification of the closest matching reference species among the specified reference list(s) that may represent multiple species. The workflow identifies the associated MLST scheme and type, determines variants found when mapping the sample data against the identified best matching reference, and finds occurring resistance genes if they match genes within the specified resistance database.

The workflow also automatically associates the analysis results to the specified Result Metadata Table. For details about searching and quick filtering among the sample metadata and generated analysis result data (see Filtering in Result Metadata Table).

Preliminary steps to run the Type Among Multiple Species workflow

Before starting the workflow,

How to run the Type Among Multiple Species workflow

To run the workflow, go to:

        Workflows | Template Workflows (Image workflow_group) | Microbial Workflows (Image mgm_folder_closed_flat_16_h_p) | Typing and Epidemiology (Image typing_epi_folder_closed_16_h_p) | Type Among Multiple Species (Image type_multispecies_16_h_p)

  1. Specify the sample(s) or folder(s) of samples you would like to type and click Next. Remember that if you select several items, they will be run as batch units.
  2. Specify the Result Metadata Table you want to add your results to and click Next.

  3. Define batch units. For details, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Running_part_workflow_multiple_times.html.

  4. Check that batching is as intended.

  5. If your reads contain adapters, add an appropriate Trim adapter list. Click Next.

  6. Choose the species-specific References to be used by the Find Best Matches using K-mer Spectra tool (figure 2.22). Click Next.

    Image typeamongmultiplespecies_spectra
    Figure 2.22: Specify the references for the Find Best Matches using K-mer Spectra tool.

  7. Specify the MLST Schemes to be used for the Identify MLST Scheme from Genomes tool so they correspond to the chosen reference list(s) (figure 2.23).

    Image typeamongmultiplespecies_identify
    Figure 2.23: Specify the schemes that best describe your sample(s).

  8. Specify the parameters for the Fixed Ploidy Variant Detection tool (figure 2.24) before clicking Next. For detailed information about all the filters, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Fixed_Ploidy_Variant_Detection.html and https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Variant_Detection_filters.html.

    Image typeamongmultiplespecies_fixedploidy
    Figure 2.24: Specify the parameters to be used for the Fixed Ploidy Variant Detection tool.

  9. Specify the parameters for the Type with MLST Scheme tool (figure 2.25).

    Image typeamongmultiplespecies_parameters
    Figure 2.25: Specify the parameters for MLST typing.

    The parameters that can be set are:

    • Kmer size. Determines the number of nucleotides in the kmer - raising this setting might increase specificity at the cost of some sensitivity.
    • Minimum kmer ratio. The minimum kmer ratio of the least occurring kmer and the average kmer hit count. If an allele scores higher than this threshold it is classified as a high-confidence call.
    • Typing threshold. The typing threshold determines how many of the kmers in a sequences type need be identified before a typing is considered conclusive. The default setting of 1.0 means that all kmers in all alleles must be matched. Lowering the setting to 0.99 would mean that on avergae 99% of all kmers in all the alleles of a given sequence type must be detected before the sequence type is considered conclusive.
    Click Next.

  10. Specify the Resistance Database (figure 2.26) and set the parameters for the Find Resistance with Nucleotide Database tool.

    Image typeamongmultiplespecies_resistanceDB
    Figure 2.26: Specify the resistance database to be used for the Find Resistance with Nucleotide Database tool.

    The parameters that can be set are:

    • Minimum identity %. The threshold for the minimum percentage of nucleotides that are identical between the best matching resistance gene in the database and the corresponding sequence in the genome.
    • Minimum length %. The percentage of the resistance gene length that a sequence must overlap to count as a hit for that gene. sequence must overlap a resistance gene to count as a hit for that gene. Here represented as a percentage of the total resistance gene length.
    • Filter overlaps. Extra filtering of results per contig, where one hit is contained by the other with a preference for the hit with the higher number of aligned nucleotides (length * identity).
    Click Next.

  11. In the "Create Sample Report" step various summary items have been set. These are guidelines to help evaluate the quality of the results (https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Create_Sample_Report.html).

  12. In the Result handling window, pressing the button Preview All Parameters allows you to preview - but not change - all parameters. Choose to save the results (we recommend to create a new folder for it) and click Finish.

The output will be saved in the location you chose, and eligible results will also be added automatically to the Metadata Result table.

The batch-specific outputs provided by this workflow are:

The combined outputs provided by this workflow are:

Through the Result Metadata Table, it is possible to filter among sample metadata and analysis results. By clicking Find Associated Data (Image find_in_project_16_h_p) and optionally performing additional filtering, it is possible to perform additional analyses on a selected subset directly from this Table, such as: