The protein coding part of the human genome accounts for around 1 % of the genome and consists of around 180,000 exons covering an area of ~30 megabases (Mb) [Ng et al., 2009].
By targeting sequencing to only the protein coding parts of the genome, exome sequencing is a cost efficient way of generating sequencing data that is believed to harbor the vast majority of the disease-causing mutations [Choi et al., 2009].
Whole exome sequencing (WES)
A number of template workflows are available for analysis of whole genome sequencing data (figure 21.1). The concept of the pre-installed template workflows is that read data are used as input in one end of the workflow and in the other end of the workflow you get a track list and a table with all the identified variants, which may or may not have been subjected to different kinds of filtering and/or annotation.
Figure 21.1: The workflows available for analyzing whole exome sequencing data.
In this chapter we will discuss what the individual template workflows can be used for and go through step by step how to run the workflows.
Remember you will have to prepare data with the Prepare Raw Data workflow described in Preparing Raw Data before you proceed to running any of these workflows.