Bin Pangenomes by Sequence

Binning by sequence is done irrespective of a database, only depending on content and coverage. To have both sources of information available, the Bin Pangenomes by Sequence tool takes read mappings to contigs as input, where there should be one read mapping per technical replicate (each mapping to the same contigs) in order to make most use of coverage information across all samples. However, if read mappings are not available, the Bin Pangenomes by Sequence tool also takes plain sequence lists of contigs as input.

The Bin Pangenomes by Sequence algorithm is based on the MetaBAT[Kang et al., 2015] and SCIMM[Kelley and Salzberg, 2010] algorithms with several modifications:

The tool was designed for sample sizes on the order of 100 000 contigs. It does not support substantially larger data sets.

To start the tool, go to:

        Tools | Microbial Genomics Module (Image mgm_folder_closed_flat_16_h_p) | Metagenomics (Image wma_folder_open_flat_16_n_p) | Taxonomic Analysis (Image taxonomic_analysis_folder_closed_16_n_p) | Bin Pangenomes by Sequence (Image binpan_sequence_16_n_p)

The Bin Pangenomes by Sequence takes one sequence list of contigs or one read mapping per sample as input (figure 6.4).

Image binseq1tool
Figure 6.4: Select the contigs or read mappings.

In the next dialog (figure 6.5), the several parameters can be specified:

Image binseq2tool
Figure 6.5: Configuration of the Bin Pangenomes by Sequence.

The tool outputs a contig list with the assigned bin of each sequence listed in the Assembly_ID column. Additionally, the following outputs can be selected in the "Result handling" dialog: