Type a Known Species

The Type a Known Species workflow is designed for typing of samples representing a single known species. It identifies the associated MLST, determines variants found when mapping the sample data against the specified reference, and finds occurring resistance genes if they match genes within the specified resistance database.

Preliminary steps to run the Type a Known Species workflow

Before starting the workflow,

How to run the Type a Known Species workflow

To run the workflow, go to:

        Workflows | Template Workflows (Image workflow_group) | Microbial Workflows (Image mgm_folder_closed_flat_16_h_p) | Typing and Epidemiology (Image typing_epi_folder_closed_16_h_p) | Type a Known Species (Image type_mlst_knownspecies_16_h_p)

  1. Specify the sample(s) or folder(s) of samples you would like to type and click Next. Remember that if you select several items, they will be run as batch units.

  2. Specify the Result Metadata Table you want to add your results to and click Next.

  3. Define batch units. For details, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Running_part_workflow_multiple_times.html.

  4. Check that batching is as intended.

  5. If your reads contain adapters, add an appropriate Trim adapter list. Click Next.

  6. Choose the Reference for Map Reads to Reference (figure 2.27). Click Next.

    Image typeknownspecies_map
    Figure 2.27: Specify the reference for the Map Reads to Reference tool.

  7. Specify the Resistance Database (figure 2.28) and set the parameters for the Find Resistance with Nucleotide Database tool.

    Image typeknownspecies_resistanceDB
    Figure 2.28: Specify the resistance database to be used for the Find Resistance with Nucleotide Database tool.

    The parameters that can be set are:

    • Minimum identity %. The threshold for the minimum percentage of nucleotides that are identical between the best matching resistance gene in the database and the corresponding sequence in the genome.
    • Minimum length %. The percentage of the resistance gene length that a sequence must overlap to count as a hit for that gene. sequence must overlap a resistance gene to count as a hit for that gene. Here represented as a percentage of the total resistance gene length.
    • Filter overlaps. Extra filtering of results per contig, where one hit is contained by the other with a preference for the hit with the higher number of aligned nucleotides (length * identity).
    Click Next.

  8. Specify the MLST Scheme and set the parameters for the Type with MLST Scheme tool (figure 2.29).

    Image typeknownspecies_parameters
    Figure 2.29: Specify the parameters for MLST typing.

    The parameters that can be set are:

    • Kmer size. Determines the number of nucleotides in the kmer - raising this setting might increase specificity at the cost of some sensitivity.
    • Minimum kmer ratio. The minimum kmer ratio of the least occurring kmer and the average kmer hit count. If an allele scores higher than this threshold it is classified as a high-confidence call.
    • Typing threshold. The typing threshold determines how many of the kmers in a sequences type need be identified before a typing is considered conclusive. The default setting of 1.0 means that all kmers in all alleles must be matched. Lowering the setting to 0.99 would mean that on avergae 99% of all kmers in all the alleles of a given sequence type must be detected before the sequence type is considered conclusive.

    Click Next.

  9. Specify the parameters for the Fixed Ploidy Variant Detection tool (figure 2.30) before clicking Next. For detailed information about all the filters, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Fixed_Ploidy_Variant_Detection.html and https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Variant_Detection_filters.html.

    Image typeknownspecies_fixedploidy
    Figure 2.30: Specify the parameters to be used for the Fixed Ploidy Variant Detection tool.

  10. In the "Create Sample Report" step various summary items have been set. These are guidelines to help evaluate the quality of the results (https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Create_Sample_Report.html).

  11. In the Result handling window, pressing the button Preview All Parameters allows you to preview - but not change - all parameters. Choose to save the results (we recommend to create a new folder for it) and click Finish.

The output will be saved in the location you chose, and eligible results will also be added automatically to the Metadata Result table.

The batch-specific outputs provided by this workflow are:

The combined outputs provided by this workflow are:

Through the Result Metadata Table, it is possible to filter among sample metadata and analysis results. By clicking Find Associated Data (Image find_in_project_16_h_p) and optionally performing additional filtering, it is possible to perform additional analyses on a selected subset directly from this Table, such as: