Trim Reads

CLC Genomics Workbench offers a number of ways to trim your sequence reads prior to assembly and mapping, including adapter trimming, quality trimming and length trimming. For each original read, the regions of the sequence to be removed for each type of trimming operation are determined independently according to choices made in the trim dialogs. The types of trim operations that can be performed are:

  1. Quality trimming based on quality scores
  2. Ambiguity trimming to trim off stretches of Ns for example
  3. Adapter trimming (automatic, or also with a Trim Adapter List, see Adapter trimming)
  4. Homopolymer trimming
  5. Base trim to remove a specified number of bases at either 3' or 5' end of the reads
  6. Length trimming to remove reads shorter or longer than a specified threshold

The trim operation that removes the largest region of the original read from either end is performed while other trim operations are ignored as they would just remove part of the same region.

Note that this may occasionally expose an internal region in a read that has now become subject to trimming. In such cases, trimming may have to be done more than once.

The result of the trim is a list of sequences that have passed the trim (referred to as the trimmed list below) and optionally a list of the sequences that have been discarded and a summary report (list of discarded sequences). The original data will not be changed.

To start trimming:

        Toolbox | Prepare Sequencing Data (Image sequencedataprep_closed_16_n_p) | Trim Reads (Image trim_reads_icon)

This opens a dialog where you can add sequences or sequence lists. If you add several sequence lists, each list will be processed separately and you will get a a list of trimmed sequences for each input sequence list.

When the sequences are selected, click Next.



Subsections