Bioinformatics explained: Protein hydrophobicity

Calculation of hydrophobicity is important to the identification of various protein features. This can be membrane spanning regions, antigenic sites, exposed loops or buried residues. Usually, these calculations are shown as a plot along the protein sequence, making it easy to identify the location of potential protein features.

Image Q6H1U7_hydrophobicity_gray
Figure 16.9: Plot of hydrophobicity along the amino acid sequence. Hydrophobic regions on the sequence have higher numbers according to the graph below the sequence, furthermore hydrophobic regions are colored on the sequence. Red indicates regions with high hydrophobicity and blue indicates regions with low hydrophobicity.

The hydrophobicity is calculated by sliding a fixed size window (of an odd number) over the protein sequence. At the central position of the window, the average hydrophobicity of the entire window is plotted (see figure 16.9).

Hydrophobicity scales

Several hydrophobicity scales have been published for various uses. Many of the commonly used hydrophobicity scales are described below.

Kyte-Doolittle scale. The Kyte-Doolittle scale is widely used for detecting hydrophobic regions in proteins. Regions with a positive value are hydrophobic. This scale can be used for identifying both surface-exposed regions as well as transmembrane regions, depending on the window size used. Short window sizes of 5-7 generally work well for predicting putative surface-exposed regions. Large window sizes of 19-21 are well suited for finding transmembrane domains if the values calculated are above 1.6 [Kyte and Doolittle, 1982]. These values should be used as a rule of thumb and deviations from the rule may occur.

Engelman scale. The Engelman hydrophobicity scale, also known as the GES-scale, is another scale which can be used for prediction of protein hydrophobicity [Engelman et al., 1986]. As the Kyte-Doolittle scale, this scale is useful for predicting transmembrane regions in proteins.

Eisenberg scale. The Eisenberg scale is a normalized consensus hydrophobicity scale which shares many features with the other hydrophobicity scales [Eisenberg et al., 1984].

Hopp-Woods scale. Hopp and Woods developed their hydrophobicity scale for identification of potentially antigenic sites in proteins. This scale is basically a hydrophilic index where apolar residues have been assigned negative values. Antigenic sites are likely to be predicted when using a window size of 7 [Hopp and Woods, 1983].

Cornette scale. Cornette et al. computed an optimal hydrophobicity scale based on 28 published scales [Cornette et al., 1987]. This optimized scale is also suitable for prediction of alpha-helices in proteins.

Rose scale. The hydrophobicity scale by Rose et al. is correlated to the average area of buried amino acids in globular proteins [Rose et al., 1985]. This results in a scale which is not showing the helices of a protein, but rather the surface accessibility.

Janin scale. This scale also provides information about the accessible and buried amino acid residues of globular proteins [Janin, 1979].

Welling scale. Welling et al. used information on the relative occurrence of amino acids in antigenic regions to make a scale which is useful for prediction of antigenic regions. This method is better than the Hopp-Woods scale of hydrophobicity which is also used to identify antigenic regions.

Kolaskar-Tongaonkar. A semi-empirical method for prediction of antigenic regions has been developed [Kolaskar and Tongaonkar, 1990]. This method also includes information of surface accessibility and flexibility and at the time of publication the method was able to predict antigenic determinants with an accuracy of 75%.

Surface Probability. Display of surface probability based on the algorithm by [Emini et al., 1985]. This algorithm has been used to identify antigenic determinants on the surface of proteins.

Chain Flexibility. Display of backbone chain flexibility based on the algorithm by [Karplus and Schulz, 1985]. It is known that chain flexibility is an indication of a putative antigenic determinant.

Many more scales have been published throughout the last three decades. Even though more advanced methods have been developed for prediction of membrane spanning regions, the simple and very fast calculations are still highly used.

aa aa Kyte-Doolittle Hopp-Woods Cornette Eisenberg Rose Janin Engelman GES
A Alanine 1.80 -0.50 0.20 0.62 0.74 0.30 1.60
C Cysteine 2.50 -1.00 4.10 0.29 0.91 0.90 2.00
D Aspartic acid -3.50 3.00 -3.10 -0.90 0.62 -0.60 -9.20
E Glutamic acid -3.50 3.00 -1.80 -0.74 0.62 -0.70 -8.20
F Phenylalanine 2.80 -2.50 4.40 1.19 0.88 0.50 3.70
G Glycine -0.40 0.00 0.00 0.48 0.72 0.30 1.00
H Histidine -3.20 -0.50 0.50 -0.40 0.78 -0.10 -3.00
I Isoleucine 4.50 -1.80 4.80 1.38 0.88 0.70 3.10
K Lysine -3.90 3.00 -3.10 -1.50 0.52 -1.80 -8.80
L Leucine 3.80 -1.80 5.70 1.06 0.85 0.50 2.80
M Methionine 1.90 -1.30 4.20 0.64 0.85 0.40 3.40
N Asparagine -3.50 0.20 -0.50 -0.78 0.63 -0.50 -4.80
P Proline -1.60 0.00 -2.20 0.12 0.64 -0.30 -0.20
Q Glutamine -3.50 0.20 -2.80 -0.85 0.62 -0.70 -4.10
R Arginine -4.50 3.00 1.40 -2.53 0.64 -1.40 -12.3
S Serine -0.80 0.30 -0.50 -0.18 0.66 -0.10 0.60
T Threonine -0.70 -0.40 -1.90 -0.05 0.70 -0.20 1.20
V Valine 4.20 -1.50 4.70 1.08 0.86 0.60 2.60
W Tryptophan -0.90 -3.40 1.00 0.81 0.85 0.30 1.90
Y Tyrosine -1.30 -2.30 3.20 0.26 0.76 -0.40 -0.70

Other useful resources

AAindex: Amino acid index database
http://www.genome.ad.jp/dbget/aaindex.html