Read shifting
Because the ChIP-seq experimental protocol selects for sequencing input fragments that are centered around a DNA-protein binding site it is expected that true peaks will exhibit a signature distribution where forward reads are found upstream of the binding site and reverse reads are found downstream of the binding site leading to reduced coverage at the exact binding site. For this reason, the algorithm allows shifting forward reads towards the 3' end and reverse reads towards the 5' end in order to generate a much more discernible peak around the putative binding site prior to the peak detection step. This is done by checking the Shift reads based on fragment length box. To shift the reads you also need to input the expected length of the sequencing input fragments by setting the Fragment length parameter, this is the size of the fragment isolated from gel (L in the illustration below).
The illustration below shows a peak where the forward reads are in one window and the reverse reads fall in another window (window 1 and 3).
--------------------------------------------------------- reference ----------------------|------------------ (actual sequenced fragment length = L bp) ----> reads ----> reads <--- reads <--- reads. |--------------------|--------------------|------------- window size W 1 2 3If the reads are not shifted, the algorithm will count 2 reads in window 1 and 3. But if the forward reads are shifted to the right and reverse reads are shifted to left, the algorithm will find 4 reads in window 2 as shown below:
--------------------------------------------------------- reference ----------------------|------------------ (actual sequenced fragment length = L basepairs) ----> reads ----> reads <--- reads <--- reads |--------------------|--------------------|------------- window size W 1 2 3
After shifting the reads, the number of reads that fall within a peak region is increased and consequently the reads will be more concentrated into fewer windows, which improves the accuracy of the peak detection. So the reported number of reads for a peak-region will be higher than in the original read mapping.
The following peak refinement step, the reporting of the peak and the visualization will use the original position of the reads, so the shifting is only a virtual shift performed as part of the peak detection.