Analyze QIAseq xHYB Mycobacterium Tuberculosis Panel Data (Human host)

The Analyze QIAseq xHYB Mycobacterium Tuberculosis Panel Data (Human host) template workflow performs spoligotyping for lineage detection and identifies high-frequency antimicrobial drug resistance variants. It is suitable for analysis of samples from human hosts generated with the QIAseq xHYB Mycobacterium tuberculosis Panel. Optionally, the workflow also detects and types Mycobacteriaceae, if the QIAseq xHYB NTM-ID Panel was used in conjunction with the QIAseq xHYB Mycobacterium tuberculosis Panel.

To analyze samples not from human hosts, you can create a copy of the workflow and edit it to fit your specific application, see Template workflows. Since the workflow element Map Reads to Human Control Genes is relevant for human data only, you should delete this. In addition, if a host genome is not relevant for you application, open the Taxonomic Profiling workflow element, and uncheck Filter host reads.
Once the workflow copy is customized, you can install it to make it available from under the Workflows menu (see Workflow installation).

To run the workflow using a variant database other than the default one, you need to modify the workflow elements where the database name appears as a column header, such as Filter for WHO variants and WHO variant associated with resistance.

QIAGEN Reference Data Set

The QIAseq xHYB Mycobacterium tuberculosis Panel Reference Data Set contains reference data relevant for this template workflow. It includes the Mycobacterium tuberculosis reference genome H37Rv and the WHO Mycobacterium tuberculosis variant database based on the WHO Mycobacterium tuberculosis mutation catalogue (see Reference Data Elements). Like the template workflow, the reference data set is designed for human samples. It contains both a human host taxonomic profiling index and a sequence list with human control genes for use in the workflow step Map Reads to Human Control Genes.

For performing Mycobacteriaceae typing analysis a version of the QIAseq xHYB Mycobacterium tuberculosis Panel Reference Data Set, which contains the hsp65 reference database needed, is also available (for more, see Mycobacteriaceae typing analysis).

Data in the QIAseq xHYB Mycobacterium tuberculosis Panel set not already downloaded can be downloaded during the launch of the workflow. It can also be downloaded, as well as managed, using the Reference Data Manager, which can be opened by clicking on the Manage Reference Data (Image referencemanager_16_n_p) button in the Toolbar. Click on the QIAGEN Sets Reference Data Library tab in the Reference Data Manager and search for the set by entering terms from its name in the search field.

For analysis of samples not from human hosts: If a non-human host is relevant for your application, you can create a host taxonomic profiling index from your host reference genome using Create Taxonomic Profiling Index, see Create Taxonomic Profiling Index.

The workflow analysis

The raw Mycobacterium tuberculosis whole genome sequencing reads are trimmed for low quality, read-through adapter sequences, and G homopolymers. Trimmed reads are used as input for the separate spoligotyping analysis.

In the Taxonomic Profiling step, reads that map to the human host index are filtered. As a quality control step, these reads are subsequently mapped to the human control genes defined for the panel. In addition to human reads, reads identified as belonging to taxonomies other than Mycobacterium tuberculosis are excluded from downstream analysis.

The remaining reads are mapped to the Mycobacterium tuberculosis reference genome, and variants are called from this read mapping. The reference genome may differ from the lineage reported by the spoligotyping step. Using the same reference genome for mapping and variant calling across samples ensures comparability of variants and facilitates alignment with variant databases, such as the WHO Mycobacterium tuberculosis mutation catalogue, which are based on a specific genome. Variant calling is optimized for calling resistance in the dominant strain of an infection: variants with frequency beneath 50% will typically not be reported.

Detected variants are compared to the WHO drug resistance variant database and annotated with drug resistance information. Larger InDels that cannot be matched to the variant database exactly (e.g. whole-gene deletions), but that overlap with possible resistance InDels, are reported as candidate InDels and annotated with information from all resistance InDels that they overlap (for more, see WHO Candidate InDels).

The analysis can also detect and type Mycobacteriaceae (for more, see Mycobacteriaceae typing analysis).

Launching the workflow

Before launching the workflow, make sure to download the QIAseq xHYB Mycobacterium tuberculosis Panel reference data set.

The Analyze QIAseq xHYB Mycobacterium Tuberculosis Panel Data (Human host) workflow is available at:

        Workflows | Template Workflows (Image workflow_group) | Microbial Workflows (Image mgm_folder_closed_flat_16_h_p) | QIAseq Analysis (Image qiaseq_workflows_folder_closed_16_n_p) | Analyze QIAseq xHYB Mycobacterium Tuberculosis Panel Data (Human host) (Image bacteria_hybrid_capture_16_n_p)

Launch the workflow and step through the wizard.

  1. Select whether to perform Mycobacteriaceae typing analysis. If the QIAseq xHYB NTM-ID Panel was used in conjunction with the QIAseq xHYB Mycobacterium tuberculosis Panel, select "Yes" (for more, see Mycobacteriaceae typing analysis).
  2. Select the sequence list(s) containing the sample reads. If selecting multiple inputs from different samples, check the Batch option, see Running workflows in batch mode.
  3. Select a reference data set or select "Use specified data elements". The latter runs the workflow using default elements, which can be viewed by clicking the "workflow roles" text just above the option.
  4. If Batch was checked in step 1, choose whether batch units should be defined based on organization of the input data, or by provided metadata. In the next step, review the batch units resulting from your selections above.
  5. Specify the spoligotyping settings (figure 2.31). Using the default values is usually sufficient, but we recommend taking a look at the spoligotyping report afterwards to make sure the results are as expected.
  6. If you selected "Yes" for performing Mycobacteriaceae typing analysis, the parameters for filtering references can be changed. This might be necessary if the expected Mycobacteriaceae species is present in the sample at a very low abundance. The default settings are expected to work in most cases. For more information about the filters, see Find Best References using Read Mapping.
  7. Finally, select a location to save outputs to.

Image analyze_tb_spoligotype
Figure 2.31: Select the minimum threshold settings for spoligotyping.

Workflow outputs and how to interpret

The outputs provided by the workflow are:

If you selected "Yes" for performing Mycobacteriaceae typing analysis, some additional outputs are provided:

The sample report "QIAseq xHYB Mycobacterium Tuberculosis Analysis Report" is the main output of the workflow. This allows for easy overview of the analysis results, both in terms of quality control and detected drug resistance for the sample. An example of the report can be seen in figure 2.32.

Image analyze_tb_report
Figure 2.32: An example report from the Analyze QIAseq xHYB Mycobacterium Tuberculosis Panel Data (Human host) workflow.

The report contains the following sections:

The variant table reports contain the following columns:

The candidate InDels table report contains the following unique columns (for more, see WHO Candidate InDels):

If no variants are detected in a section of the report, it will say "No data available".

For more info on the WHO variant database, including the resistance grades, see Reference Data Elements.

WHO 2023 candidate InDels

Candidate InDels are structural variants that overlap, but do not exactly match, a WHO-graded variant. These include large deletions that may cause loss of function of a resistance-associated gene. Only deletions that overlap with a WHO deletion, and insertions that overlap with a WHO insertion are included. Complexes are included if they overlap with either.

Candidate InDels are called by InDels and Structural Variants as Deletions, Insertions or Complexes. A complex is usually called in regions with more than 2 signature breakpoints (see Structural Variants and InDels output).

As candidate InDels may overlap with many resistance-associated variants, these are not listed individually. Instead the "Candidate Drug(s)" column includes all possible drugs to which the variants may confer resistance. Similarly, the "Candidate Grade(s)" column includes all possible grades of resistance associated with those variants. To avoid redundancy, each drug and grade will only be reported once in the column, even if multiple variants are associated with that drug and grade.

A candidate InDel is not a guarantee of resistance or susceptibility, but an indicator that one should take a closer look at that location in the read mapping, to evaluate whether the variant is of interest.

A good way to investigate a candidate InDel further is to open up the "Genome Browser" track list output from the analysis and zoom into the candidate InDel's location. In figure 2.33 it is clear from the read mapping that a large deletion is present where the "Complex" is called.

Image candidate_indel_example
Figure 2.33: A candidate complex called in a region of the genome where the read mapping clearly lacks coverage, indicating that the complex is a deletion. In the filtered WHO resistance database track (bottom), it can be seen that the candidate complex, now confirmed to be a deletion, overlaps with multiple large WHO LoF deletions.

Candidate InDels are annotated with both WHO insertions and deletions, so it is necessary to take a closer look at the variants to determine whether candidate drug resistance from the report is supported. The "WHO_mycobacterium_tuberculosis_variant_database_v1.0 (filtered)" track in the Genome Browser can help to investigate whether the InDel overlaps with a meaningful WHO variant. In figure 2.33 the candidate deletion overlaps with multiple WHO loss of function deletions, which confer resistance to the drug Isoniazid. It can be inferred that a large deletion will confer similar resistance (see also pages 88 and 102 about "feature_ablation" in [WHO, 2023]).

Mycobacteriaceae typing analysis

The Mycobacteriaceae typing analysis is intended for samples where the QIAseq xHYB NTM-ID Panel was used in conjunction with the QIAseq xHYB Mycobacterium tuberculosis Panel. It performs the analysis in the same way as the Analyze QIAseq xHYB NTM-ID Panel Data (Human host) template workflow. A description of the analysis is available under The workflow analysis.

The reads used as input for the Mycobacteriaceae analysis in this workflow, are extracted from the hsp65 gene region of the H37Rv read mapping. Due to the high level of similarity between hsp65 genes from different Mycobacteriaceae species, reads are expected to map to this region, even if they don't come from H37Rv.

The "QIAseq xHYB NTM-ID Analysis Report" report contains the following sections: