Pfam domain search
Pfam Domain Search searches for domains in protein sequences using the Pfam database [Bateman et al., 2004], a large collection of multiple sequence alignments and hidden Markov models (HMMs) covering many common protein domains. It can add Region annotations on the input sequences where domains were found (figure 20.11) and it can output a table listing the domains found.
Why search against Pfam? Many proteins have a unique combination of domains, which can be responsible for the catalytic activities of enzymes. Annotating sequences based on pairwise alignment methods by simply transferring annotation from a known protein to the unknown partner does not take domain organization into account [Galperin and Koonin, 1998]. For example, a protein may be annotated incorrectly as an enzyme if the pairwise alignment only finds a regulatory domain.
After the Pfam database has been downloaded (see Download Pfam Database), start Pfam Domain Search by going to:
Toolbox | Classical Sequence Analysis () | Protein Analysis ()| Pfam Domain Search ()
By selecting several input sequences, you can perform the analysis on all these at once. Options can be configured (figure 20.10).
Figure 20.10: Setting parameters for Pfam Domain Search.
Pfam Domain Search options
- Database Choose the database to use when searching for Pfam domains.
- Significance cutoff:
- Use profile's gathering cutoffs Use cutoffs specifically assigned to each family by the curator instead of manually assigning the Significance cutoff.
- Significance cutoff The E-value (expectation value) describes the number of hits one would expect to see by chance when searching a database of a particular size. Essentially, a hit with a low E-value is more significant than a hit with a high E-value. By lowering the significance threshold the domain search will become more specific and less sensitive, i.e. fewer hits will be reported but the reported hits will be more significant on average.
- Remove overlapping matches from the same clan Perform post-processing of the results where overlaps between hits are resolved by keeping the hit with the smallest E-value.
If annotations were added but are not initially visible on your sequences, check under the "Annotation types" tab of the side panel settings to ensure the Region annotation type has been checked.
Figure 20.11: Annotations (in red) that were added by the Pfam search tool.
Detailed information for each domain annotation is available in the annotation tool tip as well as in the Annotation Table view of the sequence list.
The domain search is performed using the hmmsearch tool from the HMMER3 package version 3.1b1 http://hmmer.org/. Detailed information about the scores in the Region annotations added can be found in the HMMER User Guide http://eddylab.org/software/hmmer/Userguide.pdf.
Individual domain annotations can be removed manually, if desired. See Removing annotations.