With CLC Genomics Workbench you can perform pattern discovery on both DNA and protein sequences. Advanced hidden Markov models can help to identify unknown sequence patterns across single or even multiple sequences.
In order to search for unknown patterns:
Toolbox | Classical Sequence Analysis () | General Sequence Analysis ()| Pattern Discovery ()
If a sequence was selected before choosing the Toolbox action, the sequence is now listed in the Selected Elements window of the dialog. Use the arrows to add or remove sequences or sequence lists from the selected elements.
You can perform the analysis on several DNA or several protein sequences at a time. If the analysis is performed on several sequences at a time the method will search for patterns which is common between all the sequences. Annotations will be added to all the sequences and a view is opened for each sequence.
Click Next to adjust parameters (see figure 14.24).
Figure 14.24: Setting parameters for the pattern discovery. See text for details.
In order to search unknown sequences with an already existing model:
Select to use an already existing model which is seen in figure 14.24. Models are represented with the following icon in the Navigation Area ().