Extract parts of a mapping

Sometimes it is useful to extract part of a mapping for in-depth analysis. This could be the case if you have performed an assembly of several genes and you want to look at a particular gene or region in isolation.

This is possible through the right-click menu of the reference or consensus sequence:

        Select on the reference or consensus sequence the part of the contig to extract | Right-click | Extract from Selection

This will present the dialog shown in figure 25.20.

Image open_new_contig_from_selection_step1
Figure 25.20: Selecting the reads to include.

The purpose of this dialog is to let you specify what kind of reads you want to include. Per default all reads are included. The options are:

Paired status
Include intact paired reads
When paired reads are placed within the paired distance specified, they will fall into this category. Per default, these reads are colored in blue.
Include paired reads from broken pairs
When a pair is broken, either because only one read in the pair matches, or because the distance or relative orientation is wrong, the reads are placed and colored as single reads, but you can still extract them by checking this box.
Include single reads
This will include reads that are marked as single reads (as opposed to paired reads). Note that paired reads that have been broken during assembly are not included in this category. Single reads that come from trimming paired sequence lists are included in this category.
Match specificity
Include specific matches
Reads that only are mapped to one position.
Include non-specific matches
Reads that have multiple equally good alignments to the reference. These reads are colored yellow per default.
Alignment quality
Include perfectly aligned reads
Reads where the full read is perfectly aligned to the reference sequence (or consensus sequence for de novo assemblies). Note that at the end of the contig, reads may extend beyond the contig (this is not visible unless you make a selection on the read and observe the position numbering in the status bar). Such reads are not considered perfectly aligned reads because they don't align in their entire length.
Include reads with less than perfect alignment
Reads with mismatches, insertions or deletions, or with unaligned nucleotides at the ends (the faded part of a read).
Note that only reads that are completely covered by the selection will be part of the new contig.

One of the benefits of this is that you can actually use this tool to extract subset of reads from a contig. An example work flow could look like this:

  1. Select the whole reference sequence
  2. Right-click and Extract from Selection
  3. Choose to include only paired matches
  4. Extract the reads from the new file (see Extract sequences)
You will now have all paired reads from the original mapping in a list.