Create Whole Genome Alignment

The Create Whole Genome Alignment tool works by identifying seeds, i.e., short stretches of nucleotide sequence that are shared between multiple genomes but not present multiple times on the same genome. These seeds are then extended using a HOXD scoring matrix [Chiaromonte et al., 2002] until the local alignment score drops below a fixed threshold. From the initial extended seed matches, a distance matrix between the input genomes is calculated. This distance matrix is used for the subsequent pairwise processing, where the most similar genomes are processed first. Proceeding iteratively on the most similar pair of genomes, the tool will then extend and merge seed matches to create longer alignment blocks. These blocks may be present on two or more genomes, and may align to both strands of the genomes (allowing for the identification of inversions). Similar to progressiveMauve [Darling AE, 2010], we combine the HOXD substitution score with an adjustment term based on kmer frequency. This is done to avoid spurious matches to repetitive regions in the genome.

To run the Create Whole Genome Alignment tool:

        Toolbox | Whole Genome Alignment (Image wga_folder_closed_16_h_p) | Create Whole Genome Alignment (Image whole_genome_alignment_16_h_p)

Once the tool wizard has opened (figure 3.1), choose two nucleotide sequences or nucleotide sequence lists. If the input objects are nucleotide sequence lists (chromosomes or contigs) , each sequence in the list is considered to be part of the same genome.

Image wgacreatealignment
Figure 3.6: Select input for the Create Whole Genome Alignment tool.

The tool has the following alignment options (figure 3.2):

Image wgacreatealignment1
Figure 3.7: Configurable parameters for the Create Whole Genome Alignment tool.

The tool also has options for working with a reference genome (figure 3.2):

The tool outputs a Whole Genome Alignment showing the aligned regions between the genomes (figure 3.3).

The output option Output genomes after alignment makes it possible to output the genomes, including any modifications, such as contig rearrangements and added annotations. This can also be used in workflows for automated annotations of genomes against a reference genome.

Note that the tree that is output per default in the Whole Genome Alignment view is a Neighbor-Joining tree based on the distance matrix (see the beginning of this section).

Image wgacreatealignment2
Figure 3.8: Whole Genome Alignment view. The star next to the top genome name indicates that this genome was chosen as a reference.

An alignment block (shown as a colored box) corresponds to a region of the genome that is aligned to a region on at least one other genome. The position of the box relative to the sequence indicates the strand on which the alignment was identified: above the sequence for the plus strand, below the sequence for the minus strand. When hovering the mouse over a block, the corresponding alignment blocks on the aligned genomes will be highlighted. The connected alignment blocks (which will share the same color) can be thought of as an ordinary linear multiple sequence alignment: they will not contain any internal rearrangements.

When clicking on a position on a genome, the view will automatically modify so that the aligned positions are centered on top of each other. When double clicking an alignment block, the regions covered by the connected alignment blocks will be selected.

The Whole Genome Alignment view shares most of the functionality of the ordinary sequence viewer: this includes the ability to show any annotations on the genomes (such as CDS or Gene annotations), searching for gene names (using the "Find" panel), and zooming down to the nucleotide level.

The Whole Genome Alignment view has a few special options:

Extracting multiple sequence alignments: When selecting part of a sequence in an alignment block, it is possible to use the context menu to extract the selection into an ordinary multiple sequence alignment (figure 3.4):

Image wgacreatealignment3
Figure 3.9: Whole genome alignment viewer.

Open as Sequence List in New View: When using the context menu on a genome (figure 3.5), it is possible to open the genome as a new sequence list, including any re-ordering, shifting, and reverse complementing done as part of the alignment.

Image wgaopenassequencelist
Figure 3.10: Whole genome alignment viewer.