Data QC and Remove Background Reads
The Data QC and Remove Background Reads workflow performs trimming of reads, creates a QC report and cleans the dataset from background DNA, leaving back only the reads that match the reference genome(s).
To run the Data QC and Remove Background Reads workflow, go to:
Toolbox | Template Workflows () | Microbial Workflows () | Metagenomics () | Taxonomic Analysis () | Data QC and Remove Background Reads ()
In the "Trim Sequences" dialog, you can specify a trim adapter list and set up parameters if you would like to trim your sequences from adapters (figure 2.1).
Figure 2.1: You can choose to trim adapter sequences from your sequencing reads.
The parameters that can be set are:
- Quality limit: defines the minimal value of the Phred score for which bases will not be trimmed.
- Trim adapter list: the adapter sequences to trim (if any).
In the Taxonomic Profiling dialog, select the "Species of interest taxpro index" you will use to map the reads (figure 2.2). Here, you can also choose to "Filter background reads". You must then specify the "Background taxpro index" (in the case of human microbiota, the Homo sapiens GRCh38 for example). The reference database can be obtained by using the Download Curated Microbial Reference Database tool (Download Curated Microbial Reference Database) or Download Custom Microbial Reference Database tool (Download Custom Microbial Reference Database). The host index and (if using the custom downloader) the microbial reference index are built with the Create Taxonomic Profiling Index tool (Create Taxonomic Profiling Index).
Figure 2.2: Select the reference databaes, and potentially a background taxpro index to remove possible contamination.
The workflow will output three folders:
- Cleaned reads: trimmed reads mapping to the species of interest taxpro index of reference genome(s).
- Background reads: reads mapping to the background taxpro index.
- Unmapped reads: reads not mapping to the species of interest or the background taxpro index.
In addition, it generates three reports: a trimming report, a graphical QC report and a supplementary QC report. All of these should be inspected in order to determine whether the quality of the sequencing reads and the trimming are acceptable. For a detailed description of the QC reports and indication on how to interpret the different values, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=QC_Sequencing_Reads.html.
For the trimming report, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Trim_output.html.