The clc_adapter_trim Program
Trims adapters from sequences.
Many sequencing technologies may leave whole or partial adapter or linker sequences in the reads for various reasons. The clc_adapter_trim program is used to find and remove such adapters from the reads.
Adapter regions in reads may contain sequencing errors. With this in mind, the clc_adapter_trim tool identifies likely adapters by aligning the known adapter sequence with each read. Matching positions in these alignments score 1, while each mismatch costs 2 and each gap costs 3. By default, a region that aligns with a score of at least 10 is considered a possible adapter region.
Using the -c option, the default alignment score limit of 10, used as a cutoff in identifying likely adapter regions in reads, can be changed.
By default, the clc_adapter_trim tool trims bases towards the 3´f reads in this way:
- For any read with just one region scoring equal to or higher than 10, that region is considered the likely adapter. The adapter region and all bases 3´it are removed.
- For any read where there is no alignment to the known adapter scoring 10 or greater, the 3´f the read is checked for any possible sign of adapter. If found, such bases will be removed. For example if a single nucleotide at the end of a read is identical to the first nucleotide of the adaptor it will be removed since it may have come from an adaptor. The end match is defined as the longest match at the end of the read having a non-negative score when aligned to the adapter.
- For any read with more than one region aligning with a score equal to or higher than 10, the region closest to the 3´s considered the likely adapter. That is removed along with any bases 3´it.
With the -e option it is possible to change the behavior so the reads are trimmed towards the 5´f reads, rather than the 3´ In this case, the conditions described above are the same, with the directionality of the actions reversed.
The clc_adapter_trim program allows fine control over the behavior of the tool. For example,
- Should read sequence before or after the adapter be kept. The default action is to keep the sequence before the adapter, but this can be altered using the -e option.
- Which reads should be kept. For example, reads of a particular length or longer are kept if the -s option is used, reads where adapter was found are kept by using the -t option, and reads where the adapter was not found are kept by using the -u option.
- Which adapter sequences should be searched for. One or several adapter sequences can be used, using the -a and -d options, and for paired data, different adapters can be used for the first and second reads in the pairs by using the -j and -k options.
- Are the data paired and if so, should single reads be kept. It is possible to keep paired input reads together as pairs throughout the trimming by using the -p, -f and -g options. During the trimming process, entire reads may be removed. This can leave only one member of a pair in the data set. If such single reads are to be kept, then the -s option can be used to indicate this.
Further details can be found in Options for All Programs.