Functional analysis
Two of the most widely used definitions of biological function are available in the form of the Gene Ontology (GO) and Pfam databases. While GO is a hierarchy of higher-level functional catagories, Pfam (Protein families) classifies proteins into families of related proteins with similar function (see for example "An introduction to the Pfam protein families database", http://pid.nci.nih.gov/2011/110913/full/pid.2011.3.shtml for furter information).
Several tools are available for functional analysis. From a whole metagenome shotgun sequencing dataset as reads, the first step is to assemble the reads using the De Novo Assemble Metagenome tool (see 4). The resulting contigs can then be annotated with coding sequences (CDS) using the third-party MetaGeneMark plugin. Given a set of contigs with CDS annotations, the Annotate CDS with Best BLAST Hit, the Annotate CDS with DIAMOND Hits and the Annotate CDS with Pfam Domains tools can be used to annotate all CDS in the annotated contigs with BLAST or DIAMOND hits or Pfam protein families and GO terms, respectively. The database needed for GO annotation can be downloaded using the Download GO Database tool, while the Pfam database can be downloaded using the built-in Download Pfam Database tool and BLAST databases can be downloaded or created using the built-in Download BLAST Dabases and Create BLAST Database tools.
Once the contigs are annotated with Pfam annotation, GO terms and/or BLAST hits, the next step will often be to map the original reads back to the annotated contigs, using the built-in Map Reads to Reference tool, in order to be able to assess the abundance of the functional annotations. This last step is performed using the Build Functional Profile tool (7.5).
All tools described above should be run independently for individual samples (or batched), resulting in a functional profile for each sample. A set of functional profiles can then be joined using the Merge Abundance Tables tool (see 9.1). The functional profile of multiple samples can now be visualized and compared as described in Section 5.2.3.
Please Note that the functionality of the tools included in the Functional analysis folder is in beta. As this is still a very active research area, the software is accordingly also under active development and subject to change without notice.
Subsections
- Find Prokaryotic Genes (beta)
- Annotate CDS with Best BLAST Hit
- Annotate CDS with Best DIAMOND Hit
- Annotate CDS with Pfam Domains
- Build Functional Profile