The Single Cell RNA-Seq Analysis algorithm

Single Cell RNA-Seq Analysis uses the same algorithm as the RNA-Seq Analysis tool of the CLC Genomics Workbench. Briefly, the tool extracts the sequence of all transcripts from the provided mRNA track. Reads are then simultaneously aligned to both this transcriptome and the full genome (and spike-in sequences if these have been provided).

Each read may have multiple equally high scoring alignments, some to transcripts and others to the genome. These alignments are translated back into genomic coordinates. In many cases, all the alignments refer to the same genomic coordinates and the read is considered `uniquely mapped'. If there are more than 10 distinct alignments in genomic coordinates, then the read is discarded.

When a read can be aligned equally well to multiple transcripts or multiple genes, it is counted towards only one of these, with the `lucky' transcript being chosen by an Expectation Maximization (EM) method similar to RSEM and eXpress. This works as follows:

The final gene expression is the sum of the expressions of the transcripts for that gene. When the option Count intronic reads is enabled, expression from introns and UTRs is also included.