The clc_sample_reads Program

This tool extracts a subset of reads where the size of the subset is a percentage of the input size. Sampling is done in a pseudo-random way, which does not guarantee that the extracted subset comprises an exact percentage of the input reads. The input reads can be provided in both interleaved and non-interleaved format and reads marked as paired are kept together.

Read sampling can be useful for reducing coverage of datasets with a very high coverage (>500x coverage) in preparation for a de novo assembly. A reduction in coverage makes the assembly run faster and reduces the chance of having overlapping errors in the reads, thus increasing the assembly quality.

See Options for full usage information.

Browse the manual

The clc_sample_reads Program