The clc_adapter_trim Program
Trims adapters from sequences.
Many sequencing technologies may leave whole or partial adapter or linker sequences in the reads for various reasons. The clc_adapter_trim program is used to find and remove such adapters from the reads.
The clc_adapter_trim tool identifies likely adapters in reads and removes them. To account for sequencing errors known adapter sequence are aligned with each read. Matching positions in these alignments score 1, while each mismatch costs 2 and each gap costs 3. By default, a region that aligns with a score of at least 10 is considered a possible adapter region. The -c option can be used to change the default score threshold of 10.
By default, the clc_adapter_trim tool trims bases towards the 3´ end of reads using the following approach:
- For any read with only one region scoring equal to or higher than 10, that region is considered the be an adapter. The adapter region and all bases towards the 3´ end are removed.
- For any read where there is no alignment to a known adapter sequence scoring 10 or greater, the 3´ end of the read is checked for any possible sign of adapter. If found, such bases will be removed. For example if a single nucleotide at the end of a read is identical to the first nucleotide of the adapter it will be removed since it may have come from an adapter. The end match is defined as the longest match at the end of the read having a non-negative score when aligned to the adapter.
- For any read with more than one region aligning with a score equal to or higher than 10, the region closest to the 3´ end is considered to be an adapter and removed along with any bases towards the 3´ end.
With the -e option it is possible to change the behavior so the reads are trimmed towards the 5´ end of reads, rather than the 3´ end. In this case, the conditions described above are the same, with the directionality of the actions reversed.
The clc_adapter_trim program allows fine control over the behavior of the tool. For example,
- Should read sequences before or after the adapter be kept. The default action is to keep the sequence before the adapter, but this can be altered using the -e option.
- Which reads should be kept. For example, reads where adapter was found are kept by using the -t or -f options, and reads where the adapter was not found are kept by using the -u or -g options.
- Which adapter sequences should be searched for. One or several adapter sequences can be used, using the -a and -d options, and for paired data, different adapters can be used for the first and second reads in the pairs by using the -j and -k options.
For adapter sequences given with the -a, -j or -k options, the reverse complement of the adapter sequences is automatically added to the list of adapters to search for.
Further details can be found in Options for All Programs.