Data QC and Clean Host DNA

The Data QC and Clean Host DNA workflow performs trimming of reads, creates a QC report and cleans the dataset from host DNA, leaving back only the reads that do not match the host genome.

To run the Data QC and Clean Host DNA workflow, go to Microbial Genomics Module (Image mgm_folder_closed_flat_16_h_p) | Metagenomics (Image wma_folder_open_flat_16_n_p) | Taxonomic Analysis (Image taxonomic_analysis_folder_16_n_p) | Workflows | Data QC and Clean Host DNA.

You can select one or several read files to analyze (figure 5.9). When choosing several read files, they will be considered as belonging to one single sample unless the batch mode option is checked, in which case each file will be considered as an individual sample.

Image clean_1
Figure 5.9: Select the reads.

In the "Trim Sequences" dialog, you can specify a trim adapter list and set up parameters if you would like to trim your sequences from adapters. Specifying a trim adapter list is optional but recommended to ensure the highest quality data for your typing analysis (figure 5.10).

Image clean_2_wf
Figure 5.10: You can choose to trim adapter sequences from your sequencing reads.

The parameters that can be set are:

In the "Taxonomic Profiling" dialog, select the reference database you will use to map the reads (figure 5.11). It is possible to "Filter host reads". You must then specify the host genome (in the case of human microbiota, the Homo sapiens hg19 for example).

Image clean_2
Figure 5.11: Select the reference databaes, and potentially choose to filter against an host genome to remove possible contamination.

The workflow will output a sequence list with reads cleaned from host DNA. In addition, it generates three reports: a trimming report, a graphical QC report and a supplementary QC report. All of these should be inspected in order to determine whether the quality of the sequencing reads and the trimming are acceptable. For a detailed description of the QC reports and indication on how to interpret the different values, see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Qc_Sequencing_Report_Content.html. For the trimming report, see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Trim_output.html.