Structure elements and their energy contribution

In this section, we classify the structure elements defining a secondary structure and describe their energy contribution.

Image RNA_structure_prediction_web
Figure 26.29: The different structure elements of RNA secondary structures predicted with the free energy minimization algorithm in CLC Genomics Workbench. See text for a detailed description.

Nested structure elements

The structure elements involving nested base pairs can be classified by a given base pair and the other base pairs that are nested and accessible from this pair. For a more elaborate description we refer the reader to [Sankoff et al., 1983] and [Zuker and Sankoff, 1984].

If the nucleotides with position number $ (i, j)$ form a base pair and $ i < k, l < j$, then we say that the base pair $ (k, l)$ is accessible from $ (i, j)$ if there is no intermediate base pair $ (i', j')$ such that $ i < i' < k, l < j' < j$. This means that $ (k, l)$ is nested within the pair $ i, j$ and there is no other base pair in between.

Using the number of accessible pase pairs, we can define the following distinct structure elements:

  1. Hairpin loop (Image hairpin). A base pair with 0 other accessible base pairs forms a hairpin loop. The energy contribution of a hairpin is determined by the length of the unpaired (loop) region and the two bases adjacent to the closing base pair which is termed a terminal mismatch (see figure 26.29A).
  2. A base pair with 1 accessible base pair can give rise to three distinct structure elements:
    • Stacking of base pairs (Image stacking). A stacking of two consecutive pairs occur if $ i'-i = 1 = j - j'$. Only canonical base pairs ($ A-U$ or $ G-C$ or $ G-U$) are allowed (see figure 26.29B). The energy contribution is determined by the type and order of the two base pairs.
    • Bulge (Image bulge). A bulge loop occurs if $ i'-i > 1$ or $ j - j' > 1,$ but not both. This means that the two base pairs enclose an unpaired region of length 0 on one side and an unpaired region of length $ \geq1$ on the other side (see figure 26.29C). The energy contribution of a bulge is determined by the length of the unpaired (loop) region and the two closing base pairs.
    • Interior loop (Image interiorloops).An interior loop occurs if both $ i'-i > 1$ and $ i-j' > 1$ This means that the two base pairs enclose an unpaired region of length $ \geq1$ on both sides (see figure 26.29D). The energy contribution of an interior loop is determined by the length of the unpaired (loop) region and the four unpaired bases adjacent to the opening- and the closing base pair.
  3. Multi loop opened (Image multiloop_opened). A base pair with more than two accessible base pairs gives rise to a multi loop, a loop from which three or more stems are opened (see figure 26.29E). The energy contribution of a multi loop depends on the number of Stems opened in multi-loop (Image multiloop_basepar) that protrude from the loop.

Other structure elements

Experimental constraints

A number of techniques are available for probing RNA structures. These techniques can determine individual components of an existing structure such as the existence of a given base pair. It is possible to add such experimental constraints to the secondary structure prediction based on free energy minimization (see figure 26.30) and it has been shown that this can dramatically increase the fidelity of the secondary structure prediction [Mathews and Turner, 2006].

Image structureconstraints
Figure 26.30: Known structural features can be added as constraints to the secondary structure prediction algorithm in CLC Genomics Workbench.