Whole Transcriptome Sequencing (WTS)

The technologies originally developed for next-generation DNA sequencing can also be applied to deep sequencing of the transcriptome. This is done through cDNA sequencing and is called RNA sequencing or simply RNA-seq.

One of the key advantages of RNA-seq is that the method is independent of prior knowledge of the corresponding genomic sequences and therefore can be used to identify transcripts from unannotated genes, novel splicing isoforms, and gene-fusion transcripts [Wang et al., 2009,Martin and Wang, 2011]. Another strength is that it opens up for studies of transcriptomic complexities such as deciphering allele-specific transcription by the use of SNPs present in the transcribed regions [Heap et al., 2010].

RNA-seq-based transcriptomic studies have the potential to increase the overall understanding of the transcriptome. However, the key to get access to the hidden information and be able to make a meaningful interpretation of the sequencing data highly relies on the downstream bioinformatic analysis.

In this chapter we will first discuss the initial steps in the data analysis that lie upstream of the analysis using ready-to-use workflows. Next, we will look at what the individual ready-to-use workflows can be used for and go through step by step how to run the workflows.

The Biomedical Genomics Workbench offers a range of different tools for RNA-seq analysis. Currently 5 different ready-to-use workflows for 3 different species (human (Image human_folder_closed_16_n_p), mouse (Image mouse_folder_closed_16_n_p) and rat (Image rat_folder_closed_16_n_p)) are available for analysis of RNA-seq data:

The ready-to-use workflows can be found in the toolbox under Whole Transcriptome Sequencing as shown in figure 15.1.

Image rnaseq_ready_to_use_workflows
Figure 15.1: The RNA-seq ready-to-use workflows.

Note! Often you will have to prepare data with one of the two Preparing Raw Data workflows described in Preparing Raw Data before you proceed to the analysis of the sequencing data RNA-Seq.

Note! Make sure that you have selected the references corresponding to the species you will be working with. To check and potentially change which Reference Data Set is currently in use, click on the Data Management (Image search_database_16_h_p) button in the top right corner of the Workbench, and click apply to the appropriate data set (Hg38, Hg19, Mouse or Rat). If you are given an error message about missing a reference data element when starting a workflow, you can delete and re-download the missing reference element or set.

Also note that in case of workflows annotating variants using databases available for more than one population, you can select the population that matches best the population your samples are derived from. This will be done in the wizard for populations from the 1000 Genomes Project, while Hapmap populations can be specified with the Data Management (Image search_database_16_h_p) function before starting the workflows (see Download and configure reference data).