Introduction to Metagenomics

Amplicon-based microbiome analysis takes advantage of DNA molecular techniques and sequencing technology in order to comprehensively retrieve specific regions of microbial genomic DNA useful for taxonomic identification. In a classic microbiome analysis workflow, total genomic DNA is extracted from the sample(s) of interest and a chosen amplicon (often the small-subunit ribosomal RNA 16S locus, or the fungal Internal Transcribed Spacer (ITS) region) is PCR amplified and sequenced using an NGS machine. The bioinformatics task is then to assign taxonomy to the reads and tally their occurrences. The tools from Amplicon-Based Analysis are designed to cluster all reads within a certain percentage of similarity into Operational Taxonomic Units (OTUs) where they are then represented by a single sequence.

Although 16S and ITS are both a taxonomically and phylogenetically informative marker, the resolution of these studies is limited. So rather than focusing on a single locus, the shotgun genomic sequencing of entire communities has become a viable alternative thanks to the decreasing costs of sequencing protocols. This approach is applicable to samples of uncultured microbiota and avoids some of the limitations of amplicon sequencing. The Taxonomic Analysis tools of Microbial Genomics Module are designed to determine which known organisms are in a sample, and how abundant they are by mapping each input reads to a reference database of complete genomes - as opposed to amplicon-based OTU databases.

It is also possible to annotate a whole metagenome shotgun sequencing dataset with BLAST hits or Pfam protein families and GO terms using the tools from the Functional Analysis folder and the third-party MetaGeneMark plugin. While GO is a hierarchy of higher-level functional catagories, Pfam (Protein families) classifies proteins into families of related proteins with similar function, allowing to build the functional profile of a microbial community.

All abundance tables generated by the methods above can be visualized in stacked bar charts, stacked area charts, sunburst charts and heat maps. In addition, the tools included in the Abundance Analysis folder will perform various statistical analyses, highlighting the results of the metagenomics study performed. The reference databases needed for clustering, profiling and annotating can be downloaded and restructured using the tools from the Databases folder of the Toolbox (see IV).

Please Note that the functionality of the Functional analysis folder described within this section is in beta. As this is still a very active research area, the software is accordingly also under active development and subject to change without notice.