Large Gap Read Mapping

The Large Gap Read Mapping tool maps reads to a reference, while allowing for large gaps in the mapping. It is developed to support transcript discovery using RNA-Seq data, since it is able to map RNA-Seq reads that span introns without requiring prior transcript annotations.

The Large Gap Read Mapping tool works by iteratively applying the standard read mapper of CLC Genomics Workbench to each read as follows:

  1. Find the best match for the read.
  2. If the match is good enough (according to the settings, see below), the read is mapped to this position.
  3. If there is an unaligned end which is long enough for the mapper to handle (15 bp for standard mapping), this part of the read is used as input to step 1.
  4. This continues until no reads have unaligned ends that are longer than 15 bp. The number of rounds required scales with the length of the reads. For short 100 bp reads the maximum will usually be maximum three rounds of mapping (corresponding to spanning two introns). For full-length transcripts more than ten rounds may be required.

The matched region of the read identified in the first round of the mapping is called the seed segment (or just 'seed'). Matched regions found in later rounds are called non-seed segments.

The Large Gap Read Mapping tool is started from the Toolbox:

        Toolbox | RNA-Seq Analysis (Image rna_seq_group_closed_16_n_p) | Transcript Discovery (Image expressionfolder) | Large Gap Read Mapping

First specify the RNA-Seq reads that should be analyzed (figure 2.1):

Image largegapmapperstep1
Figure 2.1: Selecting input reads for the Large Gap Read Mapping tool.

In the next dialog, specify the reference (a sequence track or a sequences list) to which the reads should be mapped (figure 2.2):

Image largegapmapperstep2
Figure 2.2: Selecting references for the Large Gap Read Mapping tool.

In the Mapping options dialog, specify the following parameters (figure 2.3):

Image largegapmapperstep3
Figure 2.3: Specify the parameters for the Large Gap Read Mapping tool.

Note that in addition to these mapping settings, the Large Gap Read Mapping tool requires that each mapped segment must be of minimum length 15 bp, and that at each mapping step, the mapped segment must comprise at least 10% of the read being mapped. This will initially be the full read length, but in later rounds it will be the length of the remaining unaligned part of the read.

In addition to a reads track (figure 2.4), the tool can generate the following items:

Image largegapmapperstep4
Figure 2.4: The Large Gap Read Mapping track.