Gap costs
The alignment algorithm has three parameters
concerning gap costs: Gap open cost, Gap extension cost and End gap
cost. The precision of these parameters is one decimal place.
- Gap open cost The penalty for introducing gaps in an alignment.
- Gap extension cost The penalty for every extension past the initial gap.
If you expect a lot of small gaps in your alignment, the Gap open
cost should equal the Gap extension cost. On the other hand, if you
expect few but large gaps, the Gap open cost should be set
significantly higher than the Gap extension cost.
However, for most alignments it is a good idea to set the Gap open
cost higher than the Gap extension cost. The default
values are 10.0 and 1.0 for the two parameters, respectively.
- End gap cost The penalty of gaps at the beginning or the end of the alignment.
One of the advantages of the CLC Genomics Workbench alignment method is that it
provides flexibility in the treatment of gaps at the ends of the
sequences. There are three possibilities:
- Free end gaps Any number of gaps can be inserted in the ends of
the sequences without any cost.
- Cheap end gaps All end gaps are treated as gap extensions and
any gaps past 10 are free.
- End gaps as any other Gaps at the ends of sequences are treated
like gaps in any other place in the sequences.
When aligning a long sequence with a short partial sequence, it is
ideal to use free end gaps, since this will be the best
approximation to the situation. The many gaps inserted at the ends
are not due to evolutionary events, but rather to partial data.