Introduction to Metagenomics

Amplicon-based microbiome analysis takes advantage of DNA molecular techniques and sequencing technology in order to comprehensively retrieve specific regions of microbial genomic DNA useful for taxonomic identification. In a classic microbiome analysis workflow, total genomic DNA is extracted from the sample(s) of interest and a chosen amplicon (often the small-subunit ribosomal RNA 16S locus, or the fungal Internal Transcribed Spacer (ITS) region) is PCR amplified and sequenced using an NGS machine. The bioinformatics task is then to assign taxonomy to the reads and tally their occurrences. The tools from Amplicon-Based Analysis are designed to cluster all reads within a certain percentage of similarity into Operational Taxonomic Units (OTUs) where they are then represented by a single sequence.

Although 16S and ITS are both a taxonomically and phylogenetically informative marker, the resolution of these studies is limited. So rather than focusing on a single locus, the shotgun genomic sequencing of entire communities has become a viable alternative thanks to the decreasing costs of sequencing protocols. This approach is applicable to samples of uncultured microbiota and avoids some of the limitations of amplicon sequencing. The Taxonomic Analysis tools of Microbial Genomics Module are designed to determine which known organisms are in a sample, and how abundant they are by mapping each input reads to a reference database of complete genomes - as opposed to amplicon-based OTU databases.

All abundance tables generated by the methods above can be visualized in stacked bar charts, stacked area charts, sunburst charts and heat maps. In addition, the tools included in the Abundance Analysis folder will perform various statistical analyses, highlighting the results of the metagenomics study performed. The reference databases needed for clustering, profiling and annotating can be downloaded and restructured using the tools from the Databases folder of the Toolbox (see VI).