Extract annotations from track

The Extract annotations tool makes it very easy to extract parts of a sequence (or several sequences) based on its annotations. Using a few steps it is possible to:

The output is a sequence list that contains sequences carrying the annotation specified (including the flanking regions, if this option was selected).

To extract annotations from a sequence:

        Toolbox | Classical Sequence Analysis (Image Gene_and_protein_open_16_n_p) | General Sequence Analysis (Image generalsequenceanalyses) | Extract Annotations (Image extract_annotations)

This opens the dialog shown in figure 24.20 that allows specification of which sequence to extract annotations from.

Image extractannotations_track_step2
Figure 24.20: Select one or more sequences to extract annotations from.

Click Next. At the top of the dialog shown in figure 24.21 you can specify which annotations to use:

Image extractannotations_track_step3
Figure 24.21: Adjusting parameters for extract annotations.

Select a reference sequence track. When processing tracks it is necessary to select the matching reference sequence.
Refine extracted annotations. Specify which annotation type should be extracted from the sequence.
  • Search string. All annotations and attached information for each annotation will be searched for the entered term. It can be used to make general searches for search terms such as "Gene" or "Exon", or it can be used to make more specific searches. If you e.g. have a gene annotation called "MLH1" and another called "MLH3", you can extract both annotations by entering "MLH" in the search term field.
Flanking. The sequence of interest can be extracted with flanking sequences.
  • Flanking upstream residues. The output will include this number of extra residues at the 5' end of the annotation.
  • Flanking downstream residues. The output will include this number of extra residues at the 3' end of the annotation.
Naming of subsequences. Different parameters can be included in the name of the extracted sequence. Naming options can be combined by selecting more than one.
  • Include annotation name. This will use the name of the annotation in the name of the extracted sequence.
  • Include annotation type. This corresponds to the type chosen above and will put this information in the name of the resulting sequences. This is useful information if you have chosen to extract "All" types of annotations.
  • Include annotation region. The region covered by the annotation on the original sequence (i.e. not including flanking regions) will be included in the name.
  • Include annotation chromosome. The chromosome name will be included in the name.
  • Include sequence/track name. If you have selected more than one sequence as input, this option enables you to discern the origin of the resulting sequences in the list by putting the name of the original sequence into the name of the resulting sequences.