Extract sequences from tracks
Note that the functionalities described in this page are valid for sequence or reads tracks. For similar functionalities on read mappings, see Extract reads from a mapping.
Extracting the sequence from a sequence track
It is possible to extract a DNA sequence from a sequence track included in a Track List with the Extract Sequence... option of the right-click menu on the sequence (menu to the right of figure 25.12). In this case, a pop up window proposes to Extract annotations, which will extract all the annotations present on the other tracks of the Track list, and add them to the extracted sequence.
Note that it is possible to extract the sequence of a single sequence track, but in this case (the track is not included in a track list), or when the sequence track is in a track list that does not contain annotation tracks, the option to Extract annotations is unavailable.
Extracting a single read/sequence from a reads track
Right-click on the sequence of interest and choose the Selected read... option to Copy, Open in a new view or Blast the selected sequence.
Extracting all reads/sequences from a track
Use the tool Extract Sequences from the Toolbox to extract sequences from the subset reads track as described in Extract sequences.
Extracting only selected reads/sequences from a track
The sequences of interest can be selected by dragging the mouse over the region of interest, followed by a right click on the reads (or on the sequences in the case of a sequence track) and a click on Create Reads Track from Selection (as can be seen on the menu to the left in figure 25.12).
Figure 25.12: Extract sequences from a read mapping track. This screenshot shows the menus available when right-clicking on the reference sequence and the sequences/reads in the tracks below.
An Extract from Selection pop up dialog lets you specify what kind of reads you want to include in the subset of the original reads track (figure 25.13).
Figure 25.13: Selecting the reads to include.
Per default all reads are included. The options are:
- Selected region
Type of overlap. Specifies how the reads must overlap the selected region in order to be extracted.
- Any overlap. This will extract any reads that overlap the selected region.
- Within region. Only include reads that are fully within the selected region. That is, reads overlapping boundaries of the region are not included.
- Span region. Only extract reads that span the selected region, i.e. have aligned residues on both sides of the region. For paired reads, the default is to extract fragments that span the region. The option Only include matching read(s) of read pairs can be enabled to solely extract individual reads that span the region.
- No overlap. Extracts all reads in the track, except those overlapping the selected region.
- Match specificity
- Include specific matches. Reads that only are mapped to one position.
- Include non-specific matches. Reads that have multiple equally good alignments to the reference. These reads are colored yellow per default.
- Alignment quality
- Include perfectly aligned reads. Reads where the full read is perfectly aligned to the reference sequence (or consensus sequence for de novo assemblies). Note that at the end of the contig, reads may extend beyond the contig (this is not visible unless you make a selection on the read and observe the position numbering in the status bar). Such reads are not considered perfectly aligned reads because they don't align in their entire length.
- Include reads with less than perfect alignment. Reads with mismatches, insertions or deletions, or with unaligned nucleotides at the ends (the faded part of a read).
- Spliced status
- Include spliced reads. Reads that are across an intron.
- Include non spliced reads. Reads that are not across an intron.
- Paired status
- Include intact paired reads. When paired reads are placed within the paired distance specified, they will fall into this category. Per default, these reads are colored in blue.
- Include paired reads from broken pairs. When a pair is broken, either because only one read in the pair matches, or because the distance or relative orientation is wrong, the reads are placed and colored as single reads, but you can still extract them by checking this box.
- Include single reads. This will include reads that are marked as single reads (as opposed to paired reads). Note that paired reads that have been broken during assembly are not included in this category. Single reads that come from trimming paired sequence lists are included in this category.
- Only include matching read(s) of read pairs. If only the forward or reverse read of a read pair matches the criteria, then only include the matching read as a broken pair. For example if the forward read is inside the overlap region but the reverse read is not, then this option only includes the forward read as a broken read. When both forward and reverse reads are inside the overlap region then the full paired read is included. Note that some tools ignore broken reads by default.