When is there a match?
To determine whether there is a match there is a set of scoring thresholds that can be adjusted for each adapter as shown in figure
23.2.
First, you can choose the costs for mismatch and gaps. A match is rewarded one point (this cannot be changed), and per default a mismatch costs 2 and a gap (insertion or deletion) costs 3. A few examples of adapter matches and corresponding scores are shown below.
Figure 23.3:
Three examples showing a sequencing read (top) and an adapter (bottom). The examples are artificial, using default setting with mismatch costs = 2 and gap cost = 3.
CGTATCAATCGATTACGCTATGAATG
a) ||||||| |||| 11 matches - 2 mismatches = 7
TTCAATCGGTTAC
CGTATCAATCGATTACGCTATGAATG
|||||||||| |||| 14 matches - 1 gap = 11
b) ATCAATCGAT-CGCT
CGTATCAATCGATTACGCTATGAATG
c) ||||||| 7 matches - 3 mismatches = 1
TTCAATCGGG
|
|
In the panel below, you can set the Minimum score for a match to be accepted. Note that there is a difference between an internal match and an end match. The examples above are all internal matches where the alignment of the adapter falls within the read. Below are a few examples showing an adapter match at the end:
Figure 23.4:
Four examples showing a sequencing read (top) and an adapter (bottom). The examples are artificial.
CGTATCAATCGATTACGCTATGAATG
d) ||||| 5 matches = 5 (as end match)
GATTCGTAT
CGTATCAATCGATTACGCTATGAATG
e) || |||| 6 matches - 1 mismatch = 4 (as end match)
GATTCGCATCA
CGTATCAATCGATTACGCTATGAATG
f) |||| ||||| 9 matches - 1 gap = 6 (as end match)
CGTA-CAATC
CGTATCAATCGATTACGCTATGAATG
g) |||||||||| 10 matches = 10 (as internal match)
GCTATGAATG
|
|
In the first two examples, the adapter sequence extends beyond the end of the read. This is what typically happens when sequencing e.g. small RNAs where you sequence part of the adapter. The third example shows an example which could be interpreted both as an end match and an internal match. However, the Workbench will interpret this as an end match, because it starts at beginning (5' end) of the read. Thus, the definition of an end match is that the alignment of the adapter starts at the read's 5' end. The last example could also be interpreted as an end match, but because it is a the 3' end of the read, it counts as an internal match (this is because you would not typically expect partial adapters at the 3' end of a read). Also note, that if Remove adapter is chosen for the last example, the full read will be discarded because everything 5' of the adapter is removed.
Below, the same examples are re-iterated showing the results when applying different scoring schemes. In the first round, the settings are:
- Allowing internal matches with a minimum score of 6
- Not allowing end matches
- Action: Remove adapter
The result would be the following (the retained parts are green):
Figure 23.5:
The results of trimming with internal matches only. Red is the part that is removed and green is the retained part. Note that the read at the bottom is completely discarded.
CGTATCAATCGATTAC
GCTATGAATG
a) ||||||| |||| 11 matches - 2 mismatches = 7
TTCAATCGGTTAC
CGTATCAATCGATTACGC
TATGAATG
|||||||||| |||| 14 matches - 1 gap = 11
b) ATCAATCGAT-CGCT
CGTATCAATCGATTACGCTATGAATG
c) ||||||| 7 matches - 3 mismatches = 1
TTCAATCGGG
CGTATCAATCGATTACGCTATGAATG
d) ||||| 5 matches = 5 (as end match)
GATTCGTAT
CGTATCAATCGATTACGCTATGAATG
e) || |||| 6 matches - 1 mismatch = 4 (as end match)
GATTCGCATCA
CGTATCAATCGATTACGCTATGAATG
f) |||| ||||| 9 matches - 1 gap = 6 (as end match)
CGTA-CAATC
CGTATCAATCGATTACGCTATGAATG
g) |||||||||| 10 matches = 10 (as internal match)
GCTATGAATG
|
|
A different set of adapter settings could be:
- Allowing internal matches with a minimum score of 11
- Allowing end match with a minimum score of 4
- Action: Remove adapter
The result would be:
Figure 23.6:
The results of trimming with both internal and end matches. Red is the part that is removed and green is the retained part.
CGTATCAATCGATTACGCTATGAATG
a) ||||||| |||| 11 matches - 2 mismatches = 7
TTCAATCGGTTAC
CGTATCAATCGATTACGC
TATGAATG
|||||||||| |||| 14 matches - 1 gap = 11
b) ATCAATCGAT-CGCT
CGTATCAATCGATTACGCTATGAATG
c) ||||||| 7 matches - 3 mismatches = 1
TTCAATCGGG
CGTAT
CAATCGATTACGCTATGAATG
d) ||||| 5 matches = 5 (as end match)
GATTCGTAT
CGTATCA
ATCGATTACGCTATGAATG
e) || |||| 6 matches - 1 mismatch = 4 (as end match)
GATTCGCATCA
CGTATCAATC
GATTACGCTATGAATG
f) |||| ||||| 9 matches - 1 gap = 6 (as end match)
CGTA-CAATC
CGTATCAATCGATTACGCTATGAATG
g) |||||||||| 10 matches = 10 (as internal match)
GCTATGAATG
|
|