The clc_mapping_info Program

Whereas clc_mapping_table outputs detailed information about individual matches, the clc_mapping_info program instead gives an overview:

General info:
  Program name         clc_mapper
  Program version      1.00.31043
  Program parameters   -o tmp.cas -d data/paired.fasta -q data/paired_reads.fasta -m
  Contig files:
    data/paired.fasta
  Read files:
    data/paired_reads.fasta
Read info:
  Contigs                           1
  Reads                        108420
    Unassembled reads            1506
    Assembled reads            106914
      Multi hit reads               0
Alignment info:
  Number of inserts                13
  Number of deletes                42
  Number of mismatches           9253
Coverage info:
  Total sites                  100000
  Average coverage                 37.29
  Sites covered 0 times             0
  Sites covered 1 time              0
  Sites covered 2 times             3
  Sites covered 3+ times        99997
Contig info:
  Contig     Sites     Reads   Coverage
       1    100000    106914      37.29

It is possible to make an analysis of paired distances using the clc_mapping_info program. This is done with the standard `-p' option and results in an output like this:

Paired reads info:
  Pairs                       2478655
    Average distance              215.44
    99.9 % of pairs between       175 - 253
    99.0 % of pairs between       191 - 241
    95.0 % of pairs between       197 - 234
  Not pairs                    143727
    Both seqs not matching      21946
    One seq not mathing         62938
    Both seqs matching          58843
      Different contigs             0
      Wrong directions          40524
      Too close                   663
      Too far                   17656

Note that for paired analysis clc_mapping_info assumes that read one pairs with read two, read three with read four, etc. Thus, it is crucial that the reads are from a paired experiment and that they are assembled in the right order, possibly using the interleaved option for creating the assembly. If an assembly has a mixture of paired and unpaired data, use clc_submapping to make an assembly with only the paired data before analyzing.

When a dataset contains paired data of unknown distances, a good approach is to make an initial reference assembly without using paired information. Then the clc_mapping_info program can be used to investigate the paired distance properties of the data using wide limits for the distances. Finally, a reference assembly run can be performed with the estimated paired distances at a suitable distance interval. To get a quicker result, the initial reference assembly run may be done on only a part of the data, using ungapped alignments, and/or using stricter scoring criteria. These factors will usually not affect the paired distance properties of the results, but a smaller fraction of the reads might match.

Further details can be found in Options for All Programs.