We currently expose four parameters that allow you to tweak the behavior of the algorithm.
As briefly mentioned above, you can use the
-f/--fraction parameter (default value is 30) to control how the distinction between long reads and short reads is made. By default, the longest reads corresponding to 30% of the total read length are considered "long".
Once all the short reads have been mapped to the long reads, you have the option of adjusting three filtering criteria. If short-on-long coverage falls beneath the value provided after the
-m/--min-coverage parameter, those parts of the corrected long read are excised, resulting in several, shorter reads. Those shorter reads are then filtered according to the following two parameters:
-l/--min-read-lengthparameter allows you to set a minimum read length (default is 1000). Corrected reads shorter than this are discarded.
-a/--min-average-coverageparameter specifies a lower bound on the average coverage across a corrected read (default is 15). If a read's average coverage is lower, that read will be discarded.
Here is a complete example, where we correct the PacBio reads contained in
raw_reads.fa and, producing a single file containing the corrected long reads
clc_correct_pacbio_reads -q raw_reads.fa -o corrected_reads.fa -f 30 -m 10 -l 1000 -a 15