Analyze QIAseq xHYB Mycobacterium tuberculosis and NTM-ID Panel Data (Human host)

The Analyze QIAseq xHYB Mycobacterium tuberculosis and NTM-ID Panel Data (Human host) template workflow is designed to analyze sample data from human hosts generated with the QIAseq xHYB Mycobacterium tuberculosis Panel and/or the QIAseq xHYB NTM-ID Panel. The workflow can analyze data from either or both, depending on settings chosen when running the workflow.

QIAGEN Reference Data Set

The QIAseq xHYB Mycobacterium tuberculosis Panel and QIAseq xHYB NTM-ID Panel Reference Data Sets contain reference data relevant for this template workflow, such as the Mycobacterium tuberculosis reference genome H37Rv, the WHO Mycobacterium tuberculosis variant database based on the WHO Mycobacterium tuberculosis mutation catalogue, and a non-redundant reference database of the hsp65 gene, used for detection and typing of Mycobacteriaceae. Like the template workflow, the reference data sets are designed for human samples, and additionally contain human host and human control gene references.

The QIAseq xHYB Mycobacterium tuberculosis Panel reference data set also comes in a version with an alternative reference. This reference includes the standard Mycobacterium tuberculosis H37Rv genome supplemented with experimental alternative regions derived from other strains. These regions allow variant calling outside H37Rv but are not guaranteed to be complete or fully accurate.

Reference data not already downloaded can be downloaded during the launch of the workflow. It can also be downloaded, as well as managed, using the Reference Data Manager, which can be opened by clicking on the Manage Reference Data (Image referencemanager_16_n_p) button in the Toolbar. Click on the QIAGEN Sets Reference Data Library tab in the Reference Data Manager and search for the sets by entering terms from their names in the search field.

For analysis of samples not from human hosts: If a non-human host is relevant for your application, you can download a host genome using Download Custom Microbial Reference Database, and create a host taxonomic profiling index from your host genome using Create Taxonomic Profiling Index. Then, you can create a copy of the workflow and edit it to fit your specific application, see Template workflows. Since the workflow elements Map Reads to Human Control Genes and QC for Targeted Sequencing is relevant for human data only, you should delete these. In addition, if a host genome is not relevant for you application, you can remove the host inputs from Find Best References using Read Mapping and Taxonomic Profiling.
Once the workflow copy is customized, you can install it to make it available from the Workflows menu (see Workflow installation).

The workflow analysis

The raw reads are trimmed for low quality, read-through adapter sequences, and G homopolymers. As quality control of succesful hybrid capture, human host reads are mapped to the human control genes, based on the probes included in the panels.

For the Mycobacterium tuberculosis (M. tuberculosis) analysis, unfiltered trimmed reads are used as input for the separate spoligotyping analysis. Before mapping, the same reads are filtered using Taxonomic Profiling. Here, reads that map to the human host and reads belonging to phyla other than Actinobacteriota are filtered away.

The remaining reads are mapped to the M. tuberculosis H37Rv reference genome, and variants are called from this read mapping. Variant calling is optimized for calling resistance in the dominant strain of an infection: variants with frequency beneath 50% will typically not be reported.

Detected variants are compared to the WHO drug resistance variant database and annotated with drug resistance information. Larger InDels that cannot be matched to the variant database exactly (e.g. whole-gene deletions), but that overlap with possible resistance InDels, are reported as candidate InDels and annotated with information from all resistance InDels that they overlap (for more, see WHO Candidate InDels).

To run the workflow using a variant database other than the default one, you need to modify the workflow elements where the database name appears as a column header, such as Filter for WHO variants and WHO variant associated with resistance.

For the Non-Tuberculous Mycobacteria-ID (NTM-ID) analysis, trimmed and filtered reads are mapped to the references of Mycobacteriaceae hsp65 genes using Find Best References using Read Mapping. Due to the high level of similarity between hsp65 genes from different Mycobacteriaceae species, the reads are mapped with stringent mapping parameters.

This results in an intial set of hsp65 reads and possible references. If more than one possible reference is detected for the sample reads, the analysis will try to refine the references by only looking at non-ambiguous reads mapping to this subset of the references. This helps to resolve false positive species calls as a result of the high level of similarity within the target gene.

While the detected species may contain a "variant" name (e.g. "Mycobacterium tuberculosis variant bovis"), be advised that the hsp65 gene is usually not specific enough for strain level typing - only species level typing. For mixed infections involving more than one Mycobacteriaceae species, the lower detection limit is 3% abundance relative to the most abundant species.

After reference refinement, all of the hsp65 reads will be re-mapped to the final refined list of references, and the detected species and read mapping statistics are output in the report.

Launching the workflow

The Analyze QIAseq xHYB Mycobacterium tuberculosis and NTM-ID Panel Data (Human host) workflow is available at:

        Workflows | Template Workflows (Image workflow_group) | Microbial Workflows (Image mgm_folder_closed_flat_16_h_p) | QIAseq Analysis (Image qiaseq_workflows_folder_closed_16_n_p) | Analyze QIAseq xHYB Mycobacterium tuberculosis and NTM-ID Panel Data (Human host) (Image bacteria_hybrid_capture_16_n_p)

Launch the workflow and step through the wizard.

  1. Specify which QIAseq xHYB Panel(s) were used to generate the reads. The following options are available:
    • Mycobacterium tuberculosis. Select this option if the QIAseq xHYB Mycobacterium tuberculosis Panel was used on its own. Only the M. tuberculosis analysis will be performed.
    • NTM-ID. Select this option if the QIAseq xHYB NTM-ID Panel was used on its own. Only the NTM-ID analysis will be performed.
    • Both. Select this option if the QIAseq xHYB NTM-ID Panel was used in conjunction with the QIAseq xHYB Mycobacterium tuberculosis Panel. Both the M. tuberculosis and the NTM-ID analysis will be performed.
  2. Select the sequence list(s) containing the sample reads. If selecting multiple inputs from different samples, check the Batch option (see Running workflows in batch mode).
  3. Select a reference data set or select "Use specified data elements". The latter runs the workflow using default elements, which can be viewed by clicking the "workflow roles" text just above the option.
  4. If Batch was checked in step 1, choose whether batch units should be defined based on organization of the input data, or by provided metadata. In the next step, review the batch units resulting from your selections above.
  5. If you selected "Mycobacterium tuberculosis" or "Both" in the first step, specify the spoligotyping settings (figure 2.31). Using the default values is usually sufficient, but we recommend taking a look at the spoligotyping report afterwards to make sure the results are as expected.
  6. If you selected "NTM-ID" or "Both" in the first step, the parameters for filtering Mycobacteriaceae references can be changed (figure 2.32). This might be necessary if the expected Mycobacteriaceae species is present in the sample at a very low abundance. The default settings are expected to work in most cases. For more information about the filters, see Find Best References using Read Mapping.
  7. If you selected "NTM-ID" in the first step, additional summary items have been set. These are guidelines to help evaluate the quality of the results (see Create Sample Report). Thresholds can be changed, if the defaults are too stringent for the input sample(s).
  8. Finally, select a location to save outputs to.

Image analyze_tb_spoligotype
Figure 2.31: Select the minimum threshold settings for spoligotyping.

Image ntm_filter_refs
Figure 2.32: Parameters for filtering Mycobacteriaceae references can be changed.

Workflow outputs and how to interpret

The outputs differ depending on which analyses have been run. To jump to specific output sections, you can use the links below:

M. tuberculosis analysis outputs
NTM-ID analysis outputs (with M. tuberculosis analysis)
NTM-ID only analysis outputs (without M. tuberculosis analysis)

M. tuberculosis analysis outputs

The outputs provided by the M. tuberculosis analysis are:

The sample report "QIAseq xHYB Mycobacterium Tuberculosis Analysis Report" is the main output of the workflow. This allows for easy overview of the analysis results, both in terms of quality control and detected drug resistance for the sample. An example of the report can be seen in figure 2.33.

Image analyze_tb_report
Figure 2.33: An example report from the M. tuberculosis analysis part of the workflow.

The report contains the following sections:

The variant table reports contain the following columns:

The candidate InDels table report contains the following unique columns (for more, see WHO Candidate InDels):

If no variants are detected in a section of the report, it will say "No data available".

For more info on the WHO variant database, including the resistance grades, see Reference Data Elements.

WHO 2023 candidate InDels

Candidate InDels are structural variants that overlap, but do not exactly match, a WHO-graded variant. These include large deletions that may cause loss of function of a resistance-associated gene. Only deletions that overlap with a WHO deletion, and insertions that overlap with a WHO insertion are included. Complexes are included if they overlap with either.

Candidate InDels are called by InDels and Structural Variants as Deletions, Insertions or Complexes. A complex is usually called in regions with more than 2 signature breakpoints (see Structural Variants and InDels output).

As candidate InDels may overlap with many resistance-associated variants, these are not listed individually. Instead the "Candidate Drug(s)" column includes all possible drugs to which the variants may confer resistance. Similarly, the "Candidate Grade(s)" column includes all possible grades of resistance associated with those variants. To avoid redundancy, each drug and grade will only be reported once in the column, even if multiple variants are associated with that drug and grade.

A candidate InDel is not a guarantee of resistance or susceptibility, but an indicator that one should take a closer look at that location in the read mapping, to evaluate whether the variant is of interest.

A good way to investigate a candidate InDel further is to open up the "Genome Browser" track list output from the analysis and zoom into the candidate InDel's location. In figure 2.34 it is clear from the read mapping that a large deletion is present where the "Complex" is called.

Image candidate_indel_example
Figure 2.34: A candidate complex called in a region of the genome where the read mapping clearly lacks coverage, indicating that the complex is a deletion. In the filtered WHO resistance database track (bottom), it can be seen that the candidate complex, now confirmed to be a deletion, overlaps with multiple large WHO LoF deletions.

Candidate InDels are annotated with both WHO insertions and deletions, so it is necessary to take a closer look at the variants to determine whether candidate drug resistance from the report is supported. The "WHO mycobacterium tuberculosis variant database (filtered)" track in the Genome Browser can help to investigate whether the InDel overlaps with a meaningful WHO variant. In figure 2.34 the candidate deletion overlaps with multiple WHO loss of function deletions, which confer resistance to the drug Isoniazid. It can be inferred that a large deletion will confer similar resistance (see also pages 88 and 102 about "feature_ablation" in [WHO, 2023]).

NTM-ID analysis outputs (with M. tuberculosis analysis)

If you selected "NTM-ID" when running the analysis, see NTM-ID only analysis outputs (without M. tuberculosis analysis), instead.

The reads used as input for the NTM-ID analysis in this part of the workflow, are extracted from the hsp65 gene region of the H37Rv read mapping. Due to the high level of similarity between hsp65 genes from different Mycobacteriaceae species, reads are expected to map to this region, even if they don't come from H37Rv.

The outputs provided by the NTM-ID analysis when performed together with the M. tuberculosis analysis are:

The "QIAseq xHYB NTM-ID Analysis Report" report contains the following sections:

NTM-ID only analysis outputs (without M. tuberculosis analysis)

If you selected "Both" when running the analysis, see NTM-ID analysis outputs (with M. tuberculosis analysis), instead.

The outputs provided by the NTM-ID only analysis are:

The Typing Report is the main output of the workflow. This allows for easy overview of the analysis results, both in terms of quality control and detected Mycobacteriaceae for the sample. An example of the report can be seen in figure 2.35.

Image ntm_report
Figure 2.35: An example report from the NTM-ID only analysis part of the workflow.

The report contains the following sections: