Defining reference genome and mapping settings

You are now presented with the dialog shown in figure 27.4.

Image mrna_seq_step2
Figure 27.4: Defining a reference genome for RNA-Seq.

At the top, there are two options concerning how the reference sequences are annotated:

Just below these two options, you click to select the reference sequences.

Next, you can choose to extend the region around the gene to include more of the genomic sequence by changing the value in Flanking upstream/downstream residues. This also means that you are able to look for new exons before or after the known exons (see Exon discovery).

When the reference has been defined, click Next and you are presented with the dialog shown in figure 27.5.

Image mrna_seq_step2b
Figure 27.5: Defining mapping parameters for RNA-Seq.

Different mapping algorithms are applied when mapping the reads in sequence lists containing only short reads (those under 56bp in length) and when mapping reads in sequence lists containing one or more reads that are 56bp or longer. The mapping algorithm used is applied to all reads in a given sequence list. Different algorithms are not used for particular reads within a given sequence list.

Accordingly, the mapping parameters made available to edit via the Wizard depend on the read lengths in the sequence lists. If at least one sequence list containing only short sequences (those under 56bp in length) was entered, then the "Maximum number of mismatches" setting will be available to edit. If at least one sequence list of reads containing at least one read 56bp or longer was entered, then the "Minimum length fraction" and "Minimum similarity fraction" settings will be available. If you have entered multiple sequence lists, some lists containing only short reads and some lists containing at least one or more longer reads, then all the mapping parameter settings will be made available for editing. The "Maximum number of mismatches" setting will be used only for the mapping of the lists containing all short reads. The "Minimum length fraction" and "Minimum similarity fraction" settings will be used only for the mapping of all entries in sequence lists where one or more of the reads is 56bp or longer.

The mapping parameters are:

There is also a checkbox to Use color space, which is enabled if you have imported a data set from a SOLiD platform containing color space information. Note that color space data are always treated as long reads, regardless of the read length.



Subsections