The clc_sort_pairs Program

A SOLiD paired data set usually comes in two .csfasta files, but unlike Illumina paired data the sequences are not necessarily all paired. This means that one cannot assume that sequence one from file one pairs with sequence one from file two, and sequence two from file one pairs with sequence two from file two, etc. Instead, only the names of the sequences are used to indicate which sequences form pairs.

The clc_sort_pairs program takes two SOLiD read files as input and outputs a file with unpaired reads and a file with paired reads. These files are then ready for further analysis, e.g.clc_ref_assemble_ short. Note that the output format is fasta, but no information is lost relative to .csfasta format as discussed in the color space section.

Further details are provided in Options for All Programs.