Adapter trimming

Clicking Next will allow you to specify adapter trimming.

In order to trim for adapters, you have to create an adapter list first that must be supplied to the trim tool. A new adapter trim list can be created here:

        File | New | Trim Adapter List

This will create a new empty trim adapter list. Add the adapter(s) that you would like to use for trimming by clicking on the button Add Row (Image add_16_n_p) found at the bottom of the View Area (figure 19.4). Provide the name and sequence of the adapter that should be trimmed away and adjust the parameters if relevant.

Guideline to simplify the creation of trim adapter list in some common cases.

Sequencing adapters

When you are analyzing sequencing data, the adapters must be trimmed off before you proceed with further analysis. The removal of adapters is often done directly on the sequencing machine, but in some cases, some adapters remain on the sequenced reads. The presence of remaining adapters can lead to misleading results, so we recommend to trim them off the reads.

If you have Illumina sequencing data for which adapter sequences have not been trimmed, or have been trimmed incompletely, the adapter sequences can be removed within the Biomedical Genomics Workbench using the Illumina adapter sequences that can be found here:

With Illumina adapters, the 5'end adapter is not sequenced, so adapters will always be found on the 3'end and thus need to be searched for on the Minus strand.

TruSeq adapters are ligated to the DNA fragments using TA overhangs. When creating the Trim Adapter List with TruSeq adapters, the Universal adapter can be entered in forward orientation and searched on the Minus strand. The Universal adapter already has a "T" in the end from the T-overhang in the TA ligation, so no additional "T" needs to be added to this sequence. However, the TruSeq Index adapter must be reverse-complemented: to simplify the list, only copy and reverse complement the part of the adapter that is common to all adapter indexes. Then paste it in the Trim Adapter list and add a "T" to the right-hand end of the reverse c-complemented adapter sequence. Finally set the strand to Minus. A more detailed step-by step description on how to remove TruSeq adapters can be found here

Click on the button labeled Finish to create the adapter trim list. You must now save the generated adapter trim list in the Navigation Area. You can do this by clicking on the tab and dragging and dropping the adapter trim list to the desired destination, or you can go to File in the menu bar and the choose Save as.

Image trimadapter
Figure 19.4: Create a new Adapter Trim List by clicking on the button labeled "Add Row" found at the bottom of the New Trim Adapter view.

You can also create an adapter list by importing a comma separated value (.csv) file of your Adapters. This import can be performed with the standard import using either the Automatic Import option or Force Import as Type: Trim Adapter List. To import a csv file, the names of all adapters must be unique - the Workbench is unable to accept files with multiple rows containing the same adapter name. Additionally, the text between each comma that designates a new column should be quoted. The expected import format for Adapter Lists appears as shown in figure 19.5:

Image adapter_list_import_format
Figure 19.5: The expected import format for Adapter Lists.

You can also create an Excel file (.xlsx or .xls) format. In this case, you include the same information per column as indicated above, but do not include the quotes within Excel.

At the bottom of the view, you have the following options:

Image trim-adapters
Figure 19.6: Adding a new adapter for adapter trimming.

The information to be added for each adapter is explained in the following sections, going into detail with the adapter trim. Once the adapters have been added to the list, it should be saved (Image Save_Blue_16_n_p), and you can select it as shown in figure 19.12.

Action to perform when a match is found

When the first match for a given adapter is found on a read sequence, the Trim tool acts based on the action specified for that adapter. There are three actions to choose from:

How are adaptors identified/detected?For each read sequence in the input, a Smith-Waterman alignment [Smith and Waterman, 1981] is carried out with each adapter sequence. Alignment scores are computed and compared to the minimum scores provided for each adapter when setting up the Adapter List (figure 19.6). A lower score is usual when considering matches to adapters found at the end of sequences, where the adapter sequence may be incomplete. Further details on the definition of internal and end matches are provided in the Alignment Scoring paragraph below. When the alignment score surpasses the relevant minimum score, the action specified for that adapter will be taken.

Alignment Scoring

By default a mismatch costs 2 and a gap (insertion or deletion) costs 3. A few examples of adapter matches and corresponding scores are shown in figure 19.7.

Image read_and_adapter
Figure 19.7: Three examples showing a sequencing read (top) and an adapter (bottom). The examples are artificial, using default setting with mismatch costs = 2 and gap cost = 3.

Note that there is a difference between an internal match and an end match. The examples above are all internal matches where the alignment of the adapter falls within the read. Figure 19.8 shows a few examples with an adapter match at the end.

Image read_and_adapter2
Figure 19.8: Four examples showing a sequencing read (top) and an adapter (bottom). The examples are artificial.

In the first two examples (d and e), the adapter sequence extends beyond the end of the read. This is what typically happens when sequencing small RNAs where you sequence part of the adapter. The third example (f) shows a case that could be interpreted both as an end match and an internal match. However, the workbench will interpret this as an end match, because it starts at beginning (5' end) of the read. Thus, the definition of an end match is that the alignment of the adapter starts at the read's 5' end. The last example (g) could also be interpreted as an end match, but because it is a the 3' end of the read, it counts as an internal match (this is because you would not typically expect partial adapters at the 3' end of a read). Also note, that if Remove adapter is chosen for the last example, the full read will be discarded because everything 5' of the adapter is removed.

Below (figure 19.9), the same examples are re-iterated showing the results when applying different scoring schemes. In the first round, the settings are:

Image trim3a
Figure 19.9: The results of trimming with internal matches only. Red is the part that is removed and green is the retained part. Note that the read at the bottom is completely discarded.

A different set of adapter settings could be:

The results of such settings is shown in figure 19.10.

Image trim3
Figure 19.10: The results of trimming with both internal and end matches. Red is the part that is removed and green is the retained part.

Strand settings

Each adapter is defined as either Plus or Minus. Note that all the definitions above regarding 3' end and 5' end also apply to the minus strand (i.e. selecting the Minus strand is equivalent to reverse complementing all the reads). The adapter in this case should be defined as you would see it on the plus strand of the reverse complemented read. The example below (figure 19.11) shows a few examples of an adapter defined on the minus strand. It shows hits for an adapter sequence defined as CTGCTGTACGGCCAAGGCG, searching on the minus strand.

Image adaptertrimminus
Figure 19.11: An adapter defined as CTGCTGTACGGCCAAGGCG searching on the minus strand. Red is the part that is removed and green is the retained part. The retained part is 3' of the match on the minus strand, just like matches on the plus strand.

You can see that if you reverse complemented the adapter you would find the hit on the plus strand, but then you would have trimmed the wrong end of the read. So it is important to define the adapter as it is, without reverse complementing.

Trimming of 3' ends of the reads

To trim an adapter and everything to the 3' end of the adapter you will need to search for the reverse complement of the adapter on the negative strand. This is achieved by creating a new Trim Adapter List from the reverse complement of your adapter sequence, choosing the minus strand of your reads and run adapter trimming with the new Trim Adapter List as input.

Other adapter trimming options

When you run the trim, you specify the adapter settings as shown in figure 19.12.

Image trimstep2a
Figure 19.12: Trimming your sequencing data for adapter sequences.

Select an trim adapter list (see Adapter trimming on how to create an adapter list) that defines the adapters to use.

You can specify if the adapter trimming should be performed in Color space. Note that this option is only available for sequencing data imported using the SOLiD import. When doing the trimming in color space, the Smith-Waterman alignment is simply done using colors rather than bases. The adapter sequence is still input in base space, and the Workbench then infers the color codes. Note that the scoring thresholds apply to the color space alignment (this means that a perfect match of 10 bases would get a score of 9 because 10 bases are represented by 9 color residues). Learn more about color space.

Checking the Search on both strands checkbox will search both the minus and plus strand for the adapter sequence. Note! If a match is found on the reverse strand the Trim action will reverse complement the read before trimming and output the trimmed reverse complement. Its intended use is for removal of multiplexing barcodes and primers.

Below you find a preview listing the results of trimming with the current settings on 1000 reads in the input file (reads 1001-2000 when the read file is long enough). This is useful for a quick feedback on how changes in the parameters affect the trimming (rather than having to run the full analysis several times to identify a good parameter set). The following information is shown:

Note that the preview panel is only showing how the adapter trim affects the results. If other kinds of trimming (quality or length trimming) is applied, this will not be reflected in the preview but still influence the results.