Note that the functionalities described in this page are valid for read mappings. For similar functionalities on tracks, see Extract sequences from tracks.
Sometimes it is useful to extract part of a mapping for in-depth analysis. This could be the case if you have performed an analysis of a whole genome data set and have found a region that you are particularly interested in analyzing further. Rather than running all further analysis on your full data, you may prefer to run only on a subset of the data. You can extract a subset of your mapping data by running the Extract from Selection tool on a selected region in your mapping. The result of running this tool is a new mapping which contains only the reads (and optionally only those that are of a particular type) in your selected region.
To select a region, use the Selection mode () (see Section 2.2.3 for a detailed description of the different modes) and select you region of interest in your mapping, then right-click on the reference sequence or on the consensus sequence of the mapping (figure 19.15).
When you choose the Extract from Selection option you are presented by the dialog shown in figure 19.16.
The purpose of this dialog is to let you specify what kind of reads you want to include. Per default all reads are included. The options are:
- Paired status
- Include intact paired reads
- When paired reads are placed within the paired distance specified, they will fall into this category. Per default, these reads are colored in blue.
- Include paired reads from broken pairs
- When a pair is broken, either because only one read in the pair matches, or because the distance or relative orientation is wrong, the reads are placed and colored as single reads, but you can still extract them by checking this box.
- Include single reads
- This will include reads that are marked as single reads (as opposed to paired reads). Note that paired reads that have been broken during assembly are not included in this category. Single reads that come from trimming paired sequence lists are included in this category.
- Match specificity
- Include specific matches
- Reads that only are mapped to one position.
- Include non-specific matches
- Reads that have multiple equally good alignments to the reference. These reads are colored yellow per default.
- Alignment quality
- Include perfectly aligned reads
- Reads where the full read is perfectly aligned to the reference sequence (or consensus sequence for de novo assemblies). Note that at the end of the contig, reads may extend beyond the contig (this is not visible unless you make a selection on the read and observe the position numbering in the status bar). Such reads are not considered perfectly aligned reads because they don't align in their entire length.
- Include reads with less than perfect alignment
- Reads with mismatches, insertions or deletions, or with unaligned nucleotides at the ends (the faded part of a read).
- Spliced status
- Include spliced reads
- Reads that are across an intron.
- Include non spliced reads
- Reads that are not across an intron.
One of the benefits of this is that you can actually use this tool to extract subset of reads from a contig. An example work flow could look like this:
- Select the whole reference sequence
- Right-click and Extract from Selection
- Choose to include only paired matches
- Extract the reads from the new file (see Extract sequences)
When right-clicking on the sequences (as opposed to the consensus sequence), the menu to the right of figure 19.17 is available, and allows you to Extract Sequences from the mapping as explained in Extract sequences. As opposed to the Extract from Selection tool, the Extract sequences will include all reads, not only the ones covered by the selection.