Analyze QIAseq xHYB Mycobacterium Tuberculosis Panel Data (Human host)

The Analyze QIAseq xHYB Mycobacterium Tuberculosis Panel Data (Human host) template workflow performs spoligotyping for lineage detection and identifies high-frequency antimicrobial drug resistance variants. It is suitable for analysis of samples from human hosts generated with the QIAseq xHYB Mycobacterium tuberculosis Panel.

To analyze samples not from human hosts, you can create a copy of the workflow and edit it to fit your specific application, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Template_workflows.html. Since the workflow element Map Reads to Human Control Genes is relevant for human data only, you should delete this. In addition, if a host genome is not relevant for you application, open the Taxonomic Profiling workflow element, and uncheck Filter host reads.
Once the workflow copy is customized, you can install it to make it available from the Toolbox, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Installing_workflow.html.

To run the workflow using a variant database other than the default one, you need to modify the workflow elements where the database name appears as a column header, such as Filter for WHO variants and WHO variant associated with resistance.

QIAGEN reference data set

The QIAseq xHYB Mycobacterium tuberculosis Panel reference data set is available from QIAGEN Sets Reference Data Library accessible via References (Image referencemanager_16_n_p) in the top Toolbar. It includes the Mycobacterium tuberculosis reference genome H37Rv and the WHO Mycobacterium tuberculosis variant database based on the WHO Mycobacterium tuberculosis mutation catalogue, see Reference Data Elements.

Like the template workflow, the reference data set is designed for human samples. It contains both a human host taxonomic profiling index, and a sequence list with human control genes for use in the workflow step Map Reads to Human Control Genes.

For analysis of samples not from human hosts, if a host is relevant for your application, you can create a host taxonomic profiling index from your host reference genome using Create Taxonomic Profiling Index, see Create Taxonomic Profiling Index.

The workflow analysis

The raw Mycobacterium tuberculosis whole genome sequencing reads are trimmed for low quality, read-through adapter sequences, and G homopolymers. Trimmed reads are used as input for the separate spoligotyping analysis.

In the Taxonomic Profiling step, reads that map to the human host index are filtered. As a quality control step, these reads are subsequently mapped to the human control genes defined for the panel. In addition to human reads, reads identified as belonging to taxonomies other than Mycobacterium tuberculosis are excluded from downstream analysis.

The remaining reads are mapped to the Mycobacterium tuberculosis reference genome, and variants are called from this read mapping. The reference genome may differ from the lineage reported by the spoligotyping step. Using the same reference genome for mapping and variant calling across samples ensures comparability of variants and facilitates alignment with variant databases, such as the WHO Mycobacterium tuberculosis mutation catalogue, which are based on a specific genome. Variant calling is optimized for calling resistance in the dominant strain of an infection: variants with frequency beneath 50% will typically not be reported.

Detected variants are compared to a drug resistance variant database and annotated with drug resistance information.

Launching the workflow

Before launching the workflow, make sure to download the QIAseq xHYB Mycobacterium tuberculosis Panel reference data set.

The Analyze QIAseq xHYB Mycobacterium Tuberculosis Panel Data (Human host) workflow is at:

        Toolbox | Template Workflows (Image workflow_group) | Microbial Workflows (Image mgm_folder_closed_flat_16_h_p) | QIAseq Analysis (Image qiaseq_workflows_folder_closed_16_n_p) | Analyze QIAseq xHYB Mycobacterium Tuberculosis Panel Data (Human host) (Image bacteria_hybrid_capture_16_n_p)

Launch the workflow and step through the wizard.

  1. Select the sequence list(s) containing the reads to analyze. If selecting multiple inputs from different samples, check the Batch option, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Running_workflows_in_batch_mode.html. Click on Next.
  2. Choose the option "Use the default reference data" (figure 2.68). Click on Next.
  3. If Batch was checked in step 1, choose whether batch units should be defined based on organization of the input data, or by provided metadata. In the next step, review the batch units resulting from your selections above. Click on Next.
  4. Specify the spoligotyping settings (figure 2.69). Using the default values is usually sufficient, but we recommend taking a look at the spoligotyping report afterwards to make sure the results are as expected.
  5. Finally, select a location to save outputs to and click on Finish.

Image analyze_tb_reference
Figure 2.68: Select reference data set.

Image analyze_tb_spoligotype
Figure 2.69: Select the minimum threshold settings for spoligotyping.

Workflow outputs and how to interpret

The outputs provided by the workflow are:

The sample report "QIAseq xHYB Mycobacterium Tuberculosis Analysis Report" is the main output of the workflow. This allows for easy overview of the analysis results, both in terms of quality control and detected drug resistance for the sample. An example of the report can be seen in figure 2.70.

Image analyze_tb_report
Figure 2.70: An example report from the Analyze QIAseq xHYB Mycobacterium Tuberculosis Panel Data (Human host) workflow.

The report contains the following sections:

The variant table reports contain the following columns:

If no variants are detected in a section of the report, it will say "No data available".

For more info on the WHO variant database, including the resistance grades, see Reference Data Elements.