The clc_mapping_info Program
Whereas clc_mapping_table outputs detailed information about individual matches, the clc_mapping_info program instead gives an overview:
General info:
Program name clc_mapper
Program version 1.00.31043
Program parameters -o tmp.cas -d data/paired.fasta -q data/paired_reads.fasta -m
Contig files:
data/paired.fasta
Read files:
data/paired_reads.fasta
Read info:
Contigs 1
Reads 108420
Unassembled reads 1506
Assembled reads 106914
Multi hit reads 0
Alignment info:
Number of inserts 13
Number of deletes 42
Number of mismatches 9253
Coverage info:
Total sites 100000
Average coverage 37.29
Sites covered 0 times 0
Sites covered 1 time 0
Sites covered 2 times 3
Sites covered 3+ times 99997
Contig info:
Contig Sites Reads Coverage
1 100000 106914 37.29
It is possible to make an analysis of paired distances using the clc_mapping_info program. This is done with the standard `-p' option and results in an output like this:
Paired reads info:
Pairs 2478655
Average distance 215.44
99.9 % of pairs between 175 - 253
99.0 % of pairs between 191 - 241
95.0 % of pairs between 197 - 234
Not pairs 143727
Both seqs not matching 21946
One seq not mathing 62938
Both seqs matching 58843
Different contigs 0
Wrong directions 40524
Too close 663
Too far 17656
Note that for paired analysis clc_mapping_info assumes that read one pairs with read two, read three with read four, etc. Thus, it is crucial that the reads are from a paired experiment and that they are assembled in the right order, possibly using the interleaved option for creating the assembly. If an assembly has a mixture of paired and unpaired data, use clc_submapping to make an assembly with only the paired data before analyzing.
When a dataset contains paired data of unknown distances, a good approach is to make an initial reference assembly without using paired information. Then the clc_mapping_info program can be used to investigate the paired distance properties of the data using wide limits for the distances. Finally, a reference assembly run can be performed with the estimated paired distances at a suitable distance interval. To get a quicker result, the initial reference assembly run may be done on only a part of the data, using ungapped alignments, and/or using stricter scoring criteria. These factors will usually not affect the paired distance properties of the results, but a smaller fraction of the reads might match.
Further details can be found in Options for All Programs.
