Advanced parameters
We currently expose four parameters that allow you to tweak the behavior of the algorithm.
As briefly mentioned above, you can use the -f/--fraction
parameter (default value is 30) to control how the distinction between long reads and short reads is made. By default, the longest reads corresponding to 30% of the total read length are considered "long".
Once all the short reads have been mapped to the long reads, you have the option of adjusting three filtering criteria. If short-on-long coverage falls beneath the value provided after the -m/--min-coverage
parameter, those parts of the corrected long read are excised, resulting in several, shorter reads. Those shorter reads are then filtered according to the following two parameters:
- The
-l/--min-read-length
parameter allows you to set a minimum read length (default is 1000). Corrected reads shorter than this are discarded. - The
-a/--min-average-coverage
parameter specifies a lower bound on the average coverage across a corrected read (default is 15). If a read's average coverage is lower, that read will be discarded.
Here is a complete example, where we correct the PacBio reads contained in raw_reads.fa
and, producing a single file containing the corrected long reads corrected_reads.fa
:
clc_correct_pacbio_reads -q raw_reads.fa -o corrected_reads.fa -f 30 -m 10 -l 1000 -a 15