OTU clustering

The OTU clustering tool clusters a collection of fixed length trimmed reads to operational taxonomy units.

To run the tool, go to

Microbial Genomics Module (Image mgm_folder_closed_flat_16_h_p) | Metagenomics (Image wma_folder_open_flat_16_n_p) | Amplicon-based OTU clustering (Image otutools_open_16_n_p) | OTU clustering (Image OTU_clustering_16_n_p).

The OTU Clustering tool expects the input reads to have the same length and orientation. The Fixed Length Trimming tool can be used to make sure input reads have the same length and the Optional Merge Paired Reads tool can be used to merge overlapping paired-end reads. The tool will not run when the reads do not have the same length.

The tool aligns the reads to all OTUs to create an "alignment score" for each OTU. If a read cannot be put into an already existing OTU (because there is no single OTU that is similar enough, i.e., within 97% similarity), the algorithm tries to optimize the alignment score by allowing to "cross over" from one alignment to another at a cost (the chimera crossover cost). To speed up the chimera crossover detection algorithm, the read is not aligned to all OTUs but only to some "promising candidates" found via a k-mer search. If the best match that can be constructed has at least one crossover and the "constructed alignment" is at least as good as the "similarity percentage", then the read is being considered chimeric.

By default, the similarity percentage parameter is set to 97% in the OTU Clustering tool. Therefore without the chimera crossover cost, the constructed alignments difference score can only be 3% at most. The smaller the chimeric cost, the more likely it is that a read is deemed chimeric; setting it too high decreases the chimeric detection.

The OTU clustering tool produces several outputs: a sequence list of the OTU centroids and/or of the Chimeras, and abundance tables with the newly created OTUs and/or the chimeras. Each table give abundance of the OTU or chimeras at each site, as well as the total abundance for all samples.



Subsections