Bin Pangenomes by Taxonomy

This tool assigns contigs and the reads they are composed of into bins with other contigs presumably of closely related taxonomy. For this we use a microbial reference (genome) sequence database, which comprises sequences with taxonomic information. Furthermore, in order to separate contigs that originate from plasmids from those of genomic origin, the Bin Pangenomes by Taxonomy tool additionally takes a plasmid database as input.

Binning occurs in 5 consecutive steps:

  1. Obtain taxonomic information for reads
  2. Obtain plasmid information for reads
  3. Map reads to contigs
  4. Assign taxonomic and plasmid labels to contigs
  5. Group and filter contigs according to labels (Contig purity)

To start the tool, go to:

        Toolbox | Microbial Genomics Module (Image mgm_folder_closed_flat_16_h_p) | Metagenomics (Image wma_folder_open_flat_16_n_p) | Taxonomic Analysis (Image taxonomic_analysis_folder_closed_16_n_p) | Bin Pangenomes by Taxonomy (Image binpan_taxonomy_16_n_p)

The Bin Pangenomes by Taxonomy takes one or several single or paired-end read files as input (figure 6.1).

Image bintax1tool
Figure 6.1: Select the reads.

The tool is designed to work on contigs assembled from the same set of reads used as input, previously assembled using the De Novo Assembly Metagenome tool (as in the workflow, see QC, Assemble and Bin Pangenomes). You can also specify here the minimum contig length desired (figure 6.2).

Image bintax2tool
Figure 6.2: Select the references and specify the parameters needed for running the tool.

As reference databases, one or two Taxonomic Profiling index files can be provided:

Both references can be obtained by using the Download Curated Microbial Reference Database tool (Download Curated Microbial Reference Database) or Download Custom Microbial Reference Database tool (Download Custom Microbial Reference Database). If using the latter, the indexes can be built with the Create Taxonomic Profiling Index tool (Create Taxonomic Profiling Index).

Depending on the dataset, it may be necessary to adapt the contig purity settings, where "Maximum level" refers to a maximum level in the taxonomic tree and where a specific "Minimum purity" per contig needs to be reached in order for it to be considered a part of a bin. For example, if Maximum level = Genus and Minimum purity = 0.8 and 512 reads map to a given contig, at least 0.8 * 512 = 410 reads need to have the same Genus level taxonomy in order for the contig to become part of the respective bin. If more precise taxonomic information is available (e.g., on Species level) with the requested minimum purity, this information will be used instead.

The "Result handling" dialog allows you to specify the tool's outputs (figure 6.3):

Image bintax3tool
Figure 6.3: Specify the outputs needed.

The standard output of the Bin Pangenomes by Taxonomy tool consists of a list of (binned) contigs and one sequence list per input reads file (or two for paired reads) where each of the sequences is labeled according to its most probable origin and bin it ended up in (the bin annotation is stored as "Assembly ID" annotation in order for it to work seamlessly with other tools). Also, a column called "isPlasmid" provides a true/false label whether the contig/read was mapped respectively to a plasmid or a genome. The tool can also output a Taxonomy binning report.