QC, Assemble and Bin Pangenomes

The QC, Assemble and Bin Pangenomes template workflow guides you through the key steps to analyze whole-genome shotgun metagenomic reads and assign them to clusters of sequences (bins) using the tools Bin Pangenomes by Taxonomy and Bin Pangenomes by Sequence. The inputs to the workflow are short reads belonging to a single metagenome sample (can be split in multiple sequence lists).

To run the workflow, go to:

        Workflows | Template Workflows (Image workflow_group) | Microbial Workflows (Image mgm_folder_closed_flat_16_h_p) | Metagenomics (Image wma_folder_open_flat_16_n_p) | Taxonomic Analysis (Image taxonomic_analysis_folder_closed_16_n_p) | QC, Assemble and Bin Pangenomes (Image binpan_workflow2_16_n_p)

  1. Specify the sample you would like to analyze.
  2. Specify a Trim adapter list if your sequences contain adapters (https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Adapter_trimming.html).
  3. Specify the minimum contig length, the type of de novo assembly you wish to perform (fast, or optimized for longer contigs), and whether you wish to perform scaffolding (figure 2.5).
  4. For taxonomic binning of the assembled contigs, a Taxonomic Profiling Index must be provided (figure 2.6). Reference databases can be obtained by using the Download Curated Microbial Reference Database tool (Download Curated Microbial Reference Database) or Download Custom Microbial Reference Database tool (Download Custom Microbial Reference Database). For custom reference databases, indexes can be built with the Create Taxonomic Profiling Index tool (Create Taxonomic Profiling Index).
  5. In the next dialog (figure 2.7), configure the parameters for the Bin Pangenomes by Sequence tool. You can set the minimum contig length to exclude shorter contigs, as binning by sequence requires longer sequences for good results. You can also choose the maximum number of iterations that should be performed, and how to label singletons (bins with a single contig).
  6. In the "Create Sample Report" step various summary items have been set. These are guidelines to help evaluate the quality of the results (https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Create_Sample_Report.html).

Image bintax3
Figure 2.5: Parameters for the De Novo Assembly Metagenome tool.

Image bintax4
Figure 2.6: Select the reference index for Bin Pangenomes by Taxonomy.

Image bintax5
Figure 2.7: Configure the Bin Pangenomes by Sequence.

The workflow produces the following outputs:

The Sample report should be inspected in order to determine whether the quality of the sequencing reads and the analysis results are acceptable.

Additionally, you will find the "De novo assemble metagenome report" in the "QC & Reports" subfolder. For a detailed description, see (De Novo Assemble Metagenome output).

Individual bins can be extracted from the sequence and contig lists by filtering by the bin label in the "Assembly_ID" column, either manually in the table view of the sequence list or by using Filter on Custom Criteria (https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Filter_on_Custom_Criteria.html) or Split Sequence List (https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Split_Sequence_List.html). Contigs can be used for downstream analysis such as reference-based assembly (or re-assembly), functional analysis, typing etc.