Paired data in RNA-Seq

The CLC Genomics Workbench supports the use of paired data for RNA-Seq. A combination of single reads and paired reads can also be used. There are three major advantages of using paired data: At the bottom you can specify how the mapping of Paired reads should be handled. You can read more about how paired data are imported and handled in General notes on handling paired data. If the sequence list used as input for the mapping contains paired reads, this option will automatically be shown - if it contains single reads, this option will not be shown. Paired reads lists have a field on them that describe the expected minimum and maximum distances between reads in a pair. These are the values that are shown in the 'minimum distance' and 'maximum distance' fields. The RNA-seq read mapper relies on these distances to determine whether reads are mapped as an intact or broken pair. The user may 'over-ride' the values on the read lists by providing his own values in these fields. Note that for the RNA-seq read mapper, the distance between reads in a pair is measured at the transcript and not the genomic level -- that is, intron regions are ignored.

When counting the mapped reads to generate expression values, the CLC Genomics Workbench needs to decide how to handle paired reads. The standard behavior is this: if two reads map as a pair, the pair is counted as one. If the pair is broken, none of the reads are counted. The reasoning is that something is not right in this case, it could be that the transcripts are not represented correctly on the reference, or there are errors in the data. In general, more confidence is placed with an intact pair. If a combination of paired and single reads are used, "true" single reads will also count as one (the single reads that come from broken pairs will not count).

In some situations it may be too strict to disregard broken pairs. This could be in cases where there is a high degree of variation compared to the reference or where the reference lacks comprehensive transcript annotations. By checking the Use 'include broken pairs' counting scheme, both intact and broken pairs are now counted as two. For the broken pairs, this means that each read is counted as one. Reads that are single reads as input are still counted as one.

When looking at the mappings, reads from broken pairs have a darker color than reads that are intact pairs or originally single reads.



Footnotes

... variant.27.1
Note that the CLC Genomics Workbench only calculates the expression of the transcripts already annotated on the reference.