Data QC and OTU Clustering
The Data QC and OTU Clustering workflow is meant for amplicon sequencing data. It trims reads and performs either reference-based or de novo OTU clustering. The resulting abundance table can optionally be filtered. The workflow additionally runs QC for Sequencing Reads, which can be used to assess the quality of the raw reads.
Filter Samples Based on Number of Reads filters samples with fewer than 100 reads. If multiple samples are used for the input, samples that have fewer than half of the median number of reads will be excluded.
This template workflow is available from:
Workflows | Template Workflows () | Microbial Workflows (
) | Metagenomics (
) | Amplicon-Based Analysis (
) | Data QC and OTU Clustering (
)
- Specify the sample(s) or folder(s) of samples you would like to analyze.
- Specify a Trim adapter list if your sequences contain adapters (https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Adapter_trimming.html).
- Choose whether to run the de novo or reference-based OTU clustering and set the available similarity parameters. If selecting Reference based OTU clustering, choose whether to allow creation of new OTUs and provide an OTU database. Reference databases can be downloaded using Download Amplicon-Based Reference Database (see Download Amplicon-Based Reference Database).
- Various options and filters can be set for refining the abundance table after clustering (see Refine Abundance Table). Note that if De novo OTU clustering was chosen in the previous step, then the Aggregation level must be set to "Do not aggregate".
- In the "Create Sample Report" step various summary items have been set. These are guidelines to help evaluate the quality of the results (https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Create_Sample_Report.html).
The workflow produces the following outputs:
- QC & Reports. Folder containing the individual reports generated during the analysis.
- OTU abundance table. An abundance table containing all the samples input into the workflow, and refined according to the parameters set in the Refine Abundance Table step. Rows in the table will have taxonomies if reference-based OTU clustering was chosen.
- Sample report. The sample report is curated to contain the most important information for analysis interpretation. All full reports are linked throughout the Sample report or can be found in the QC & Reports folder. The Sample report icon will be colored based on whether Summary item thresholds were met. See the "Quality control" section in the sample report for specifics.
The Sample report should be inspected in order to determine whether the quality of the sequencing reads and the analysis results are acceptable.