Trim Sequences

CLC Cancer Research Workbench offers a number of ways to trim your sequence reads prior to assembly and mapping, including adapter trimming, quality trimming and length trimming. For each original read, the regions of the sequence to be removed for each type of trimming operation are determined independently according to choices made in the trim dialogs. The types of trim operations that can be performed are:

  1. Quality trimming based on quality scores
  2. Ambiguity trimming to trim off e.g. stretches of Ns
  3. Adapter trimming
  4. Base trim to remove a specified number of bases at either 3' or 5' end of the reads
  5. Length trimming to remove reads shorter or longer than a specified threshold

The trim operation that removes the largest region of the original read from either end is performed while other trim operations are ignored as they would just remove part of the same region.

Note that this may occasionally expose an internal region in a read that has now become subject to trimming. In such cases, trimming may have to be done more than once.

The result of the trim is a list of sequences that have passed the trim (referred to as the trimmed list below) and optionally a list of the sequences that have been discarded and a summary report (list of discarded sequences). The original data will not be changed.

When you are analyzing sequencing data, the adapters must be trimmed off before you proceed with further analysis. The removal of adapters is often done directly on the sequencing machine. If adapters have not been trimmed off, please do so before proceeding with your analysis. The presence of adapters will lead to misleading results.

If you are working with sequences that still have adapters present, they can be trimmed using the Trim Sequences tool provided in the "NGS Core tools" folder in the toolbox.

Illumina Adapters
Illumina recently changed their adapter sequences and this may have consequences for the downstream data analysis if the new adapter sequences were used for the sequencing analysis and the old adapter sequences were used for trimming off the adapter sequences.

If you have Illumina sequencing data that have been generated with the new adapter sequences and have not been trimmed or have been trimmed incompletely, the adapter sequences can be removed within the CLC Cancer Research Workbench using the Illumina adapter sequences that can be found here:

and the tool Trim Sequences (Image trim_sequence_right_16_n_p) that is available in the Toolbox in the "Tools" section under Preparing Raw Data (Image preparing_raw_data_open_16_h_p).

To start trimming:

        Toolbox | Tools | Preparing Raw Data (Image preparing_raw_data_open_16_h_p) | Trim Sequences (Image trim_sequence_right_16_n_p)

This opens a dialog where you can add sequences or sequence lists. If you add several sequence lists, each list will be processed separately and you will get a a list of trimmed sequences for each input sequence list.

When the sequences are selected, click Next.