The clc_mapping_info Program
Whereas clc_mapping_table outputs detailed information about individual matches, the clc_mapping_info program instead gives an overview:
General info: Program name clc_mapper Program version 1.00.31043 Program parameters -o tmp.cas -d data/paired.fasta -q data/paired_reads.fasta -m Contig files: data/paired.fasta Read files: data/paired_reads.fasta Read info: Contigs 1 Reads 108420 Unassembled reads 1506 Assembled reads 106914 Multi hit reads 0 Alignment info: Number of inserts 13 Number of deletes 42 Number of mismatches 9253 Coverage info: Total sites 100000 Average coverage 37.29 Sites covered 0 times 0 Sites covered 1 time 0 Sites covered 2 times 3 Sites covered 3+ times 99997 Contig info: Contig Sites Reads Coverage 1 100000 106914 37.29
It is possible to make an analysis of paired distances using the clc_mapping_info program. This is done with the standard `-p' option and results in an output like this:
Paired reads info: Pairs 2478655 Average distance 215.44 99.9 % of pairs between 175 - 253 99.0 % of pairs between 191 - 241 95.0 % of pairs between 197 - 234 Not pairs 143727 Both seqs not matching 21946 One seq not mathing 62938 Both seqs matching 58843 Different contigs 0 Wrong directions 40524 Too close 663 Too far 17656
Note that for paired analysis clc_mapping_info assumes that read one pairs with read two, read three with read four, etc. Thus, it is crucial that the reads are from a paired experiment and that they are assembled in the right order, possibly using the interleaved option for creating the assembly. If an assembly has a mixture of paired and unpaired data, use clc_submapping to make an assembly with only the paired data before analyzing.
When a dataset contains paired data of unknown distances, a good approach is to make an initial reference assembly without using paired information. Then the clc_mapping_info program can be used to investigate the paired distance properties of the data using wide limits for the distances. Finally, a reference assembly run can be performed with the estimated paired distances at a suitable distance interval. To get a quicker result, the initial reference assembly run may be done on only a part of the data, using ungapped alignments, and/or using stricter scoring criteria. These factors will usually not affect the paired distance properties of the results, but a smaller fraction of the reads might match.
Further details can be found in Options for All Programs.