Measuring the distance between the reads

How the distance between the reads should be measured depends on how the sequencing experiment is done. If the reads are sequenced in the upstream to downstream direction, the start of the reads is where the distance should be measured. This is indicated by the `ss' code for start to start. The allowed values are `ss', `se', `es', and `ee', where the first letter indicates which end of the first read should be used and the second letter indicates which end of the second read should be used (`s' for start and `e' for end). The `ss' option is the most typical.

So, for typical paired end Illumina sequencing protocol, using the `fb ss' combination ensures the correct relative directions of the reads. It also ensures that the distance is independent of the read length since typical sequencing experiment progress expands the reads toward each other from their starting points.

When the `-p' option is used, it applies to all read files from that point and forward in the command line. If different experiments with different paired properties are combined, the `-p' option can be used several times. To indicate that the following read files are not paired, used `-p no'. This is only necessary if another `-p' option was previously used. An example:

clc_mapper -o assembly.cas -d human.gb -q reads1.fasta -p fb ss 180
                                            250 reads2.fasta -p no reads3.fasta

Here, we have three read files, where reads1.fasta and reads3.fasta are unpaired, while reads2.fasta are paired reads.

Note that the clc_sort_pairs and clc_split_reads program can be used to convert data from SOLiD and 454 systems, respectively, into an format accepted by the CLC Assembly Cell tools.