Scoring Schemes
Alignments are scored using Smith Waterman alignment with a linear gap cost. A linear gap cost means that an insertion or deletion of length two costs twice as much as an insertion or deletion of length one. This corresponds to individual insertion and deletion events occurring independently, even if adjacent.
The parameters are:
Parameter | Option | Restrictions |
Match score | - | Always 1 |
Mismatch cost | `-x' | Between 1 and 3. Default is 2 |
Gap cost | `-g' | Between 1 and 3. Default is 34.1 |
An ambiguous nucleotide aligned to any other nucleotide including the same ambiguous type is treated as a mismatch.
It is the relative scores and costs that determine an alignment, so multiplying all the scores by a common factor would give the same alignment. Thus, having the match score fixed to one does not significantly reduce the flexibility in the scoring scheme since the other values can be adjusted.
The restricted values in the scoring scheme allows more efficient algorithms to be used, which can have a large impact on the time required when large data sets are being considered.
Footnotes
- ... 34.1
- Note that if the gap is set at 1 and the mismatch is set at 2, then the algorithm will insert a gap in the reference sequence and then another gap in the reference target to avoid a mismatch.