Data QC and OTU Clustering

The Data QC and OTU Clustering workflow is meant for amplicon sequencing data. It trims reads and performs either reference-based or de novo OTU clustering. The resulting abundance table can optionally be filtered. The workflow additionally runs QC for Sequencing Reads, which can be used to assess the quality of the raw reads.

Filter Samples Based on Number of Reads filters samples with fewer than 100 reads. If multiple samples are used for the input, samples that have fewer than half of the median number of reads will be excluded.

This template workflow is available from:

        Workflows | Template Workflows (Image workflow_group) | Microbial Workflows (Image mgm_folder_closed_flat_16_h_p) | Metagenomics (Image wma_folder_open_flat_16_n_p) | Amplicon-Based Analysis (Image otutools_open_16_n_p) | Data QC and OTU Clustering (Image data_qc_otu_clustering_16_n_p)

  1. Specify the sample(s) or folder(s) of samples you would like to analyze.
  2. Specify a Trim adapter list if your sequences contain adapters (https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Adapter_trimming.html).
  3. Choose whether to run the de novo or reference-based OTU clustering and set the available similarity parameters. If selecting Reference based OTU clustering, choose whether to allow creation of new OTUs and provide an OTU database. Reference databases can be downloaded using Download Amplicon-Based Reference Database (see Download Amplicon-Based Reference Database).
  4. Various options and filters can be set for refining the abundance table after clustering (see Refine Abundance Table). Note that if De novo OTU clustering was chosen in the previous step, then the Aggregation level must be set to "Do not aggregate".
  5. In the "Create Sample Report" step various summary items have been set. These are guidelines to help evaluate the quality of the results (https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Create_Sample_Report.html).

The workflow produces the following outputs:

The Sample report should be inspected in order to determine whether the quality of the sequencing reads and the analysis results are acceptable.