Over-representation analysis
The 5mer analysis examines the enrichment of penta-nucleotides. The enrichment of a 5mer is calculated as the ratio of observed and expected 5mer frequencies. An expected frequency is calculated as product of the empirical nucleotide probabilities that make up the 5mer. (Example: given the 5mer = CCCCC and cytosines have been observed to 20% in the examined sequences, the 5mer expectation is
). Note that 5mers that contain ambiguous bases (anything different from A/T/C/G) are ignored.
- Individual 5mer distribution
- Calculates the absolute coverage and enrichment for each 5mer (observed/expected based on background distribution of nucleotides) for each base position, and plots position vs enrichment data for the top five enriched 5mers (or fewer if less than five enriched 5mers are present). This analysis will reveal if there is a pattern of bias at different positions over the read length. Such a bias might origin from non-trimmed adapter sequences, poly-A tails or other sources.