The Motif Search tool
The Motif Search tool supports searching for motifs in nucleotide or peptide sequences, including in alignments. Motifs to search for can be provided as simple text, as regular expressions or in lists, and a match accuracy for a successful match can be defined. Searches on the negative strand of nucleotide sequences is optional. Results can be added as annotation to the sequences, and a table of results can also be output.
To start Motif Search tool, go to:
Tools | Classical Sequence Analysis () | General Sequence Analysis ()| Motif Search ()
In the launch wizard, select the sequences, sequence lists, or alignments from the Navigation Area.
Search options are provided in the "Motif Search Parameters" wizard step (see figure 18.27).
Figure 18.27: Specifying the options for the Motif Search tool.
The Motif Search options are:
- Motif types. Choose what kind of motif to be used:
- Simple motif. Choosing this option means that you enter a simple motif, e.g. ATGATGNNATG.
- Java regular expression. See Java regular expressions.
- Prosite regular expression. For proteins, you can enter different protein patterns from the PROSITE database (protein patterns using regular expressions and describing specific amino acid sequences). The PROSITE database contains a great number of patterns and have been used to identify related proteins (see https://prosite.expasy.org/cgi-bin/prosite/prosite-list.pl).
- Use motif list. Clicking the small button () will allow you to select a saved motif list (see Create motif lists).
- Motif. If you choose to search with a simple motif, you should enter a literal string as your motif. Ambiguous amino acids and nucleotides are allowed. Example; ATGATGNNATG. If your motif type is Java regular expression, you should enter a regular expression according to the syntax rules described here. Press Shift + F1 key for options. For proteins, you can search with a Prosite regular expression and you should enter a protein pattern from the PROSITE database.
- Accuracy. If you search with a simple motif, you can adjust the accuracy of the motif to the match on the sequence. If you type in a simple motif and let the accuracy be 80%, the motif search algorithm runs through the input sequence and finds all subsequences of the same length as the simple motif such that the fraction of identity between the subsequence and the simple motif is at least 80%. A motif match is added to the sequence as an annotation with the exact fraction of identity between the subsequence and the simple motif. If you use a list of motifs, the accuracy applies only to the simple motifs in the list.
- Search for reverse motif. This enables searching on the negative strand on nucleotide sequences.
- Exclude unknown regions. Genome sequence often have large regions with unknown sequence. These regions are very often padded with N's. Ticking this checkbox will not display hits found in N-regions.Motif search handles ambiguous characters in the way that two residues are different if they do not have any residues in common. For example: For nucleotides, N matches any character and R matches A,G. For proteins, X matches any character and Z matches E,Q.
- Create report. This will create a report with summary information about motifs found.
- Create table. This will create an overview table of all the motifs found for all the input sequences.
- Add annotations to sequences. This will add an annotation to the sequence when a motif is found (an example is shown in figure 18.28). For details on viewing annotations see Viewing annotations.
Figure 18.28: Sequence view displaying the pattern found. The search string was 'tataaa'.