When is there a match?

To determine whether there is a match there is a set of scoring thresholds that can be adjusted for each adapter as shown in figure 23.2.

First, you can choose the costs for mismatch and gaps. A match is rewarded one point (this cannot be changed), and per default a mismatch costs 2 and a gap (insertion or deletion) costs 3. A few examples of adapter matches and corresponding scores are shown below.

Figure 23.3: Three examples showing a sequencing read (top) and an adapter (bottom). The examples are artificial, using default setting with mismatch costs = 2 and gap cost = 3.

    CGTATCAATCGATTACGCTATGAATG
a)      ||||||| ||||            11 matches - 2 mismatches = 7
       TTCAATCGGTTAC


    CGTATCAATCGATTACGCTATGAATG
       |||||||||| ||||          14 matches - 1 gap = 11
b)     ATCAATCGAT-CGCT


    CGTATCAATCGATTACGCTATGAATG
c)      |||||||                 7 matches - 3 mismatches = 1
       TTCAATCGGG

In the panel below, you can set the Minimum score for a match to be accepted. Note that there is a difference between an internal match and an end match. The examples above are all internal matches where the alignment of the adapter falls within the read. Below are a few examples showing an adapter match at the end:

Figure 23.4: Four examples showing a sequencing read (top) and an adapter (bottom). The examples are artificial.
        CGTATCAATCGATTACGCTATGAATG
d)      |||||                         5 matches = 5 (as end match)
    GATTCGTAT
        CGTATCAATCGATTACGCTATGAATG
e)      || ||||                       6 matches - 1 mismatch = 4 (as end match)
    GATTCGCATCA
    CGTATCAATCGATTACGCTATGAATG
f)  |||| |||||                        9 matches - 1 gap = 6 (as end match)
    CGTA-CAATC
    CGTATCAATCGATTACGCTATGAATG
g)                  ||||||||||        10 matches = 10 (as internal match)
                    GCTATGAATG

In the first two examples, the adapter sequence extends beyond the end of the read. This is what typically happens when sequencing e.g. small RNAs where you sequence part of the adapter. The third example shows an example which could be interpreted both as an end match and an internal match. However, the Workbench will interpret this as an end match, because it starts at beginning (5' end) of the read. Thus, the definition of an end match is that the alignment of the adapter starts at the read's 5' end. The last example could also be interpreted as an end match, but because it is a the 3' end of the read, it counts as an internal match (this is because you would not typically expect partial adapters at the 3' end of a read). Also note, that if Remove adapter is chosen for the last example, the full read will be discarded because everything 5' of the adapter is removed.

Below, the same examples are re-iterated showing the results when applying different scoring schemes. In the first round, the settings are:

The result would be the following (the retained parts are green):

Figure 23.5: The results of trimming with internal matches only. Red is the part that is removed and green is the retained part. Note that the read at the bottom is completely discarded.

     CGTATCAATCGATTAC GCTATGAATG
a)      ||||||| ||||                 11 matches - 2 mismatches = 7
       TTCAATCGGTTAC


     CGTATCAATCGATTACGC TATGAATG
       |||||||||| ||||               14 matches - 1 gap = 11
b)     ATCAATCGAT-CGCT


     CGTATCAATCGATTACGCTATGAATG
c)      |||||||                       7 matches - 3 mismatches = 1
       TTCAATCGGG


         CGTATCAATCGATTACGCTATGAATG
d)      |||||                         5 matches = 5 (as end match)
    GATTCGTAT


         CGTATCAATCGATTACGCTATGAATG
e)      || ||||                       6 matches - 1 mismatch = 4 (as end match)
    GATTCGCATCA


     CGTATCAATCGATTACGCTATGAATG
f)  |||| |||||                        9 matches - 1 gap = 6 (as end match)
    CGTA-CAATC


     CGTATCAATCGATTACGCTATGAATG
g)                  ||||||||||       10 matches = 10 (as internal match)
                    GCTATGAATG

A different set of adapter settings could be:

The result would be:

Figure 23.6: The results of trimming with both internal and end matches. Red is the part that is removed and green is the retained part.


     CGTATCAATCGATTACGCTATGAATG
a)      ||||||| ||||                 11 matches - 2 mismatches = 7
       TTCAATCGGTTAC


     CGTATCAATCGATTACGC TATGAATG
       |||||||||| ||||               14 matches - 1 gap = 11
b)     ATCAATCGAT-CGCT


     CGTATCAATCGATTACGCTATGAATG
c)      |||||||                       7 matches - 3 mismatches = 1
       TTCAATCGGG


         CGTAT CAATCGATTACGCTATGAATG
d)      |||||                         5 matches = 5 (as end match)
    GATTCGTAT


         CGTATCA ATCGATTACGCTATGAATG
e)      || ||||                       6 matches - 1 mismatch = 4 (as end match)
    GATTCGCATCA


     CGTATCAATC GATTACGCTATGAATG
f)  |||| |||||                        9 matches - 1 gap = 6 (as end match)
    CGTA-CAATC


     CGTATCAATCGATTACGCTATGAATG
g)                  ||||||||||       10 matches = 10 (as internal match)
                    GCTATGAATG