The Taxonomy Binning Report
The taxonomy binning report has the following sections:
- Contigs
- Accepted. The number of contigs that have the required minimum contig length, and can be assigned a taxonomy with the specified "Maximum level" and "Minimum purity".
- Rejected. The remaining contigs.
- Reads. The number of reads that are unmapped, or that map to the accepted and rejected contigs. All reads mapping to contigs are counted regardless of whether or not these support the assigned taxonomy.
- Bins.
A bin is created for each assigned taxonomy, and for each contig for which no taxonomy can be assigned.
- Accepted. Bins that contain accepted contigs.
- Rejected. The remaining bins.
- Accepted contig bins / Rejected contig bins. These two tables are sorted by "Approximate completeness". They contain the same columns:
- Bin. An identifier for the bin. This takes the form "TaxBinX" where X is a number starting from 0. There is no significance to the number of a tax bin. The identifier matches the "Assembly ID" on the binned contigs.
- Taxonomy. The taxonomy of the bin. For "Accepted contig bins" this is always at least as specific as the "Maximum level". For "Rejected contig bins" it may be less specific or Unknown. Note that Unknown only means that no taxonomy was found at the required "Minimum purity". For example, if Minimum purity is 0.9, then a bin will be labeled as Unknown even if 89% of reads are assigned the same species.
- Taxonomic level (plasmid). The level at which the taxonomy is assigned, e.g. Species. If the taxonomy is assigned via the provided "Plasmid reference index", then the word "(plasmid)" will be added e.g. Species (plasmid).
- Contigs. The number of contigs in the bin.
- Nucleotides contigs. The number of nucleotides in the contigs.
- Reads. The number of reads mapping to the contigs.
- Nucleotides reads. The number of nucleotides in the mapped reads, regardless of how or if they mapped to the contig. This number therefore includes unaligned ends, and counts overlapping nucleotides of read pairs twice.
- Approximate completeness. To calculate the approximate completeness, Nucleotides contigs is divided by the average length of the sequences in the reference indexes that both 1) have the same taxonomy, and 2) have reads mapping to them. This number can exceed 1, either because multiple contigs may be assembled for one sequence in the reference index, or because some reads may map to a short reference index with the same taxonomy, but for which no contig is assembled.
- Taxonomic purity (read level). The purity as a percentage out of 100. This will either be 0.00 or greater than or equal to the "Minimum purity" setting. This is because taxonomies are not assigned when the purity is less than the specified Minimum purity.
- Average contig coverage. This is calculated as Nucleotides reads / Nucleotides contigs. This is a rough estimate of coverage because Nucleotides reads does not take account of how the nucleotides in the reads mapped to the contig.