RNA-Seq Analysis for Long Reads
The RNA-Seq Analysis for Long Reads tool supports analysis of RNA-Seq data by mapping sequencing reads to an annotated reference genome with minimap2 [Li, 2018] and distributing and counting the reads across genes and transcripts. Subsequently, the results can be used for expression analysis.
RNA-Seq analysis with long reads is done in several steps: First, all annotated transcripts or genes are extracted. If there are several annotated splice variants, they are all extracted. Next, the reads are mapped against all the transcripts, and to the whole genome using minimap2. For more information about the read mapper, see Map Long Reads to Reference.
From this mapping, the reads are categorized and assigned to the transcripts using the EM estimation algorithm, and expression values for each gene are obtained by summing the transcript counts belonging to the gene.
Detailed information on RNA-Seq analysis including the EM algoritm is found at http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=_EM_estimation_algorithm.html.
The results can be used as input for expression analysis and other downstream RNA-Seq analysis tools in CLC Genomics Workbench, see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=RNA_Seq_Tools.html.
Subsections