Alignment scoring and match thresholds

For each read sequence in the input, a Smith-Waterman alignment [Smith and Waterman, 1981] is carried out with each adapter sequence. Alignment scores are computed and compared to the minimum scores provided for each adapter when setting up the Adapter List (figure 22.3). A lower score is usual when considering matches to adapters found at the end of sequences, where the adapter sequence may be incomplete. When the alignment score surpasses the relevant minimum score, the action specified for that adapter will be taken.

By default a mismatch costs 2 and a gap (insertion or deletion) costs 3. A few examples of adapter matches and corresponding scores are shown in figure 22.6.

Image read_and_adapter
Figure 22.6: Three examples showing a sequencing read (top) and an adapter (bottom). The examples are artificial, using default setting with mismatch costs = 2 and gap cost = 3.

Note that there is a difference between an internal match and an end match. The examples above are all internal matches where the alignment of the adapter falls within the read. Figure 22.7 shows a few examples with an adapter match at the end.

Image read_and_adapter2
Figure 22.7: Four examples showing a sequencing read (top) and an adapter (bottom). The examples are artificial.

In the first two examples (d and e), the adapter sequence extends beyond the end of the read. This is what typically happens when sequencing small RNAs where you sequence part of the adapter. The third example (f) shows a case that could be interpreted both as an end match and an internal match. However, the workbench will interpret this as an end match, because it starts at beginning (5' end) of the read. Thus, the definition of an end match is that the alignment of the adapter starts at the read's 5' end. The last example (g) could also be interpreted as an end match, but because it is a the 3' end of the read, it counts as an internal match (this is because you would not typically expect partial adapters at the 3' end of a read). Also note, that if Remove adapter is chosen for the last example, the full read will be discarded because everything 5' of the adapter is removed.

Below (figure 22.8), the same examples are re-iterated showing the results when applying different scoring schemes. In the first round, the settings are:

Image trim3a
Figure 22.8: The results of trimming with internal matches only. Red is the part that is removed and green is the retained part. Note that the read at the bottom is completely discarded.

A different set of adapter settings could be:

The results of such settings is shown in figure 22.9.

Image trim3
Figure 22.9: The results of trimming with both internal and end matches. Red is the part that is removed and green is the retained part.