Analyze QIAseq xHYB Viral Panel Data (Human host)
The Analyze QIAseq xHYB Viral Panel Data (Human host) template workflow trims reads, performs taxonomic profiling, and calls viral variants. It is suitable for analysis of samples from human hosts generated with the QIAseq xHYB viral panels:
- QIAseq xHYB Respiratory Panel
- QIAseq xHYB Viral STI Panel
- QIAseq xHYB Adventitious Agent Panel
- QIAseq xHYB MPXV Panel
To analyze non-human samples, you can create a copy of the workflow and edit it to fit your specific application, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Template_workflows.html. Since the workflow element Map Reads to Human Control Genes is relevant for human data only, you should delete this. In addition, if a host genome is not relevant for you application, open the Taxonomic Profiling workflow element, and uncheck Filter host reads.
Once the workflow copy is customized, you can install it to make it available from the Toolbox, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Installing_workflow.html.
QIAGEN reference data set
The QIAseq xHYB Viral Panels reference data set is available from QIAGEN Sets Reference Data Library accessible via References () in the top Toolbar.
Like the template workflow, the reference data set is designed for human samples. It contains both a human host taxonomic profiling index, and a sequence list with human control genes for use in the workflow step Map Reads to Human Control Genes.
For analysis of non-human data, if a host is relevant for your application, you can create a host taxonomic profiling index from your host reference genome using Create Taxonomic Profiling Index, see Create Taxonomic Profiling Index.
Launching the workflow
The Analyze QIAseq xHYB Viral Panel Data (Human host) workflow is at:
Toolbox | Template Workflows () | Microbial Workflows () | QIAseq Analysis () | Analyze QIAseq xHYB Viral Panel Data (Human host) ()
Launch the workflow and step through the wizard.
- Select the sequence list(s) containing the reads to analyze. Click on Next.
- Select a reference data set or select "Use the default reference data" to configure the reference data elements individually in subsequent wizard steps (figure 2.71). Click on Next.
- Choose whether batch units should be defined based on organization of the input data, or by provided metadata (figure 2.72). For information on how to use metadata when running part of a workflow multiple times, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Running_part_workflow_multiple_times.html.
- Next, you can review the batch units resulting from your selections above. Click on Next.
- Verify or select the viral taxonomic profiling index (figure 2.73) and click on Next.
- Verify or select the host taxonomic profiling index and click on Next.
- Select the viral reference database(s). If in the first step you selected the QIAseq xHYB Viral Panels reference set, you can now select which of the available viral reference databases from that set to apply (2.74). If you chose to use the default reference data, select a reference database and click on Next.
- Verify or select the control genes and click on Next.
- Specify the trim settings.
- Specify Taxonomic Profiling settings (figure 2.75).
- Specify Low Frequency Variant Detection settings, see figure 2.76.
- Finally, select a location to save outputs to and click on Finish.
Figure 2.71: Select reference data set.
Figure 2.72: Define batch units.
Figure 2.73: Select viral taxonomic profiling index.
Figure 2.74: Select one or more viral reference databases.
Figure 2.75: Set taxonomic profiling parameters.
Figure 2.76: Low frequency variant detection settings.
Workflow tools and outputs
The Analyze QIAseq xHYB Viral Panel Data (Human host) template workflow consists of the following tools.
- QC for Sequencing Reads. Performs basic quality control of the sequencing reads. The output, which is included in a combined report, can be used to evaluate the quality of the sequencing reads. See https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=QC_Sequencing_Reads.html.
- Trim Reads. Removes adapter sequences and low quality nucleotides. The appropriate settings for the Trim Reads tool depends on the protocol used to generate the reads. See https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Trim_Reads.html.
- Taxonomic Profiling. Analyzes the taxonomic composition of samples and estimates the relative abundance of the detected taxa. See Taxonomic Profiling. Host reads i.e., reads that map to the host taxonomic profiling index, do not count toward the taxonomic profiling result, but are used as input for Map Reads to Human Control Genes. Viral reads - reads that map to the viral taxonomic profiling index - are later used as input for Find Best References using Read Mapping.
- Map Reads to Human Control Genes. Maps the host reads output from Taxonomic Profiling to the host taxonomic profiling index, to a reference of human control genes. See https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Map_Reads_Reference.html. This serves as a QC step to verify mapping to the human control genes. For human samples, you expect to see mapping of reads to all human control genes.
- Find Best References using Read Mapping. Maps the viral reads output from Taxonomic Profiling to the selected viral reference database to identify which reference sequence is the "Best match". See Find Best Reference using Read Mapping.
- Remove Duplicate Mapped Reads. Removes duplicate reads derived from PCR amplification (or other enrichment) during sample preparation from the mapping. See https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Remove_Duplicate_Mapped_Reads.html. The output reads track is used as input for Local Realignment.
- Local Realignment. Improves the alignment of the reads in the reads track. See https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Local_Realignment.html.
- Low Frequency Variant Detection. Calls variants in the read mapping that are present at low frequencies. See https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Low_Frequency_Variant_Detection.html.
- Filter on Custom Criteria, Filter against Known Variants, and Remove Marginal Variants. Remove variants that fall below a set of thresholds. For this workflow, coverage >30 and frequency >20% is required. See https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Variant_filtering.html.
- Amino Acid Changes. Uses the called variants to generate a track of amino acid changes. See https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Amino_Acid_Changes.html.
- Create Mapping Graph and Identify Graph Threshold Areas. Creates a track with regions with coverage below a threshold. For this workflow, the threshold is set to 30. See https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Create_Mapping_Graph.html and https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Identify_Graph_Threshold_Areas.html.
- Extract Consensus Sequence. Makes a consensus sequence from the read tracks from Local Realignment. See https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Extract_Consensus_Sequence.html.
- QC for Read Mapping. Performs quality control of the read mapping. See https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=QC_Read_Mapping.html.
- Merge Abundance Tables. Merges the sample-specific abundance tables to one combined abundance table. See Merge Abundance Tables.
The sample-specific outputs provided by this workflow are:
- QC Report Raw Reads. Quality control metrics for the raw reads.
- Abundance Table. The abundance of each identified taxa, along with their full taxonomy. See Taxonomic Profiling abundance table.
- Read Mapping Human Control Genes. The host reads mapped against the control gene reference.
- Viral Reads. List of reads that mapped to the viral taxonomic profiling index.
- Find Best Reference Report. Report of the "Best match" reference identified by Find Best References using Read Mapping.
- Best Match Sequence. The "Best match" reference sequence as identified by the Find Best References using Read Mapping tool.
- Read Mapping. Reads mapped to the "Best match" viral reference. Output from Local Realignment.
- Consensus Sequence. Viral consensus sequence(s), extracted from the above Read Mapping output.
- Annotated Variant Track. List of detected variants left after filtering, annotated with amino acid changes.
- Amino Acid Track. List of amino acid changes.
- Low Coverage Areas. List of low coverage regions in the Read Mapping output.
- Track List. Collection of the following viral tracks: Consensus sequence, reads, variants, amino acid changes, and low coverage regions.
- QC and Taxonomic Profiling Report combines QC Report Raw Reads and the Taxonomic Profiling report.
The combined outputs provided by this workflow are:
- Taxonomic Profiling Report. Combines taxonomic profiling report content across samples in the workflow run. See Taxonomic Profiling Report.
- Human Control Genes Read Mapping Report. Holds information about the mapping of host reads to the human control genes for all samples in the workflow run.
- Combined Report. Combines information from various tools, including QC, taxonomic profiling and mapping reports.
- Merged Abundance Table. Provides abundances for the detected taxa for all samples in the workflow run. See Taxonomic Profiling abundance table.