Bibliography

Allison et al., 2006
Allison, D., Cui, X., Page, G., and Sabripour, M. (2006).
Microarray data analysis: from disarray to consolidation and consensus.
NATURE REVIEWS GENETICS, 7(1):55.

Altschul and Gish, 1996
Altschul, S. F. and Gish, W. (1996).
Local alignment statistics.
Methods Enzymol, 266:460-480.

Altschul et al., 1990
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990).
Basic local alignment search tool.
J Mol Biol, 215(3):403-410.

Andrade et al., 1998
Andrade, M. A., O'Donoghue, S. I., and Rost, B. (1998).
Adaptation of protein surfaces to subcellular location.
J Mol Biol, 276(2):517-525.

Bachmair et al., 1986
Bachmair, A., Finley, D., and Varshavsky, A. (1986).
In vivo half-life of a protein is a function of its amino-terminal residue.
Science, 234(4773):179-186.

Baggerly et al., 2003
Baggerly, K., Deng, L., Morris, J., and Aldaz, C. (2003).
Differential expression in SAGE: accounting for normal between-library variation.
Bioinformatics, 19(12):1477-1483.

Bateman et al., 2004
Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E. L. L., Studholme, D. J., Yeats, C., and Eddy, S. R. (2004).
The Pfam protein families database.
Nucleic Acids Res, 32(Database issue):D138-D141.

Bendtsen et al., 2004a
Bendtsen, J. D., Jensen, L. J., Blom, N., Heijne, G. V., and Brunak, S. (2004a).
Feature-based prediction of non-classical and leaderless protein secretion.
Protein Eng Des Sel, 17(4):349-356.

Bendtsen et al., 2005
Bendtsen, J. D., Kiemer, L., Fausbøll, A., and Brunak, S. (2005).
Non-classical protein secretion in bacteria.
BMC Microbiol, 5:58.

Bendtsen et al., 2004b
Bendtsen, J. D., Nielsen, H., von Heijne, G., and Brunak, S. (2004b).
Improved prediction of signal peptides: SignalP 3.0.
J Mol Biol, 340(4):783-795.

Benjamini and Hochberg, 1995
Benjamini, Y. and Hochberg, Y. (1995).
Controlling the false discovery rate: a practical and powerful approach to multiple testing.
JOURNAL-ROYAL STATISTICAL SOCIETY SERIES B, 57:289-289.

Bishop and Friday, 1985
Bishop, M. J. and Friday, A. E. (1985).
Evolutionary trees from nucleic acid and protein sequences.
Proceeding of the Royal Society of London, B 226:271-302.

Blaisdell, 1989
Blaisdell, B. E. (1989).
Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a computer-generated model system.
J Mol Evol, 29(6):538-47.

Blobel, 2000
Blobel, G. (2000).
Protein targeting (Nobel lecture).
Chembiochem., 1:86-102.

Bolstad et al., 2003
Bolstad, B., Irizarry, R., Astrand, M., and Speed, T. (2003).
A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.
Bioinformatics, 19(2):185-193.

Bommarito et al., 2000
Bommarito, S., Peyret, N., and SantaLucia, J. (2000).
Thermodynamic parameters for DNA sequences with dangling ends.
Nucleic Acids Res, 28(9):1929-1934.

Chen et al., 2004
Chen, G., Znosko, B. M., Jiao, X., and Turner, D. H. (2004).
Factors affecting thermodynamic stabilities of RNA 3 x 3 internal loops.
Biochemistry, 43(40):12865-12876.

Clote et al., 2005
Clote, P., Ferré, F., Kranakis, E., and Krizanc, D. (2005).
Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency.
RNA, 11(5):578-591.

Cornette et al., 1987
Cornette, J. L., Cease, K. B., Margalit, H., Spouge, J. L., Berzofsky, J. A., and DeLisi, C. (1987).
Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins.
J Mol Biol, 195(3):659-685.

Costa, 2007
Costa, F. F. (2007).
Non-coding RNAs: lost in translation?
Gene, 386(1-2):1-10.

Cronn et al., 2008
Cronn, R., Liston, A., Parks, M., Gernandt, D. S., Shen, R., and Mockler, T. (2008).
Multiplex sequencing of plant chloroplast genomes using solexa sequencing-by-synthesis technology.
Nucleic Acids Res, 36(19):e122.

Crooks et al., 2004
Crooks, G. E., Hon, G., Chandonia, J.-M., and Brenner, S. E. (2004).
WebLogo: a sequence logo generator.
Genome Res, 14(6):1188-1190.

Dayhoff and Schwartz, 1978
Dayhoff, M. O. and Schwartz, R. M. (1978).
Atlas of Protein Sequence and Structure, volume 3 of 5 suppl., pages 353-358.
Nat. Biomed. Res. Found., Washington D.C.

Dayhoff et al., 1978
Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. (1978).
A model of evolutionary change in protein.
Atlas of Protein Sequence and Structure, 5(3):345-352.

Dempster et al., 1977
Dempster, A., Laird, N., Rubin, D., et al. (1977).
Maximum likelihood from incomplete data via the EM algorithm.
Journal of the Royal Statistical Society, 39(1):1-38.

Dudoit et al., 2003
Dudoit, S., Shaffer, J., and Boldrick, J. (2003).
Multiple Hypothesis Testing in Microarray Experiments.
STATISTICAL SCIENCE, 18(1):71-103.

Eddy, 2004
Eddy, S. R. (2004).
Where did the BLOSUM62 alignment score matrix come from?
Nat Biotechnol, 22(8):1035-1036.

Edgar, 2004
Edgar, R. C. (2004).
Muscle: a multiple sequence alignment method with reduced time and space complexity.
BMC Bioinformatics, 5:113.

Efron, 1982
Efron, B. (1982).
The jackknife, the bootstrap and other resampling plans, volume 38.
SIAM.

Eisen et al., 1998
Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998).
Cluster analysis and display of genome-wide expression patterns.
Proceedings of the National Academy of Sciences, 95(25):14863-14868.

Eisenberg et al., 1984
Eisenberg, D., Schwarz, E., Komaromy, M., and Wall, R. (1984).
Analysis of membrane and surface protein sequences with the hydrophobic moment plot.
J Mol Biol, 179(1):125-142.

Emini et al., 1985
Emini, E. A., Hughes, J. V., Perlow, D. S., and Boger, J. (1985).
Induction of hepatitis a virus-neutralizing antibody by a virus-specific synthetic peptide.
J Virol, 55(3):836-839.

Engelman et al., 1986
Engelman, D. M., Steitz, T. A., and Goldman, A. (1986).
Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins.
Annu Rev Biophys Biophys Chem, 15:321-353.

Falcon and Gentleman, 2007
Falcon, S. and Gentleman, R. (2007).
Using GOstats to test gene lists for GO term association.
Bioinformatics, 23(2):257.

Felsenstein, 1981
Felsenstein, J. (1981).
Evolutionary trees from DNA sequences: a maximum likelihood approach.
J Mol Evol, 17(6):368-376.

Felsenstein, 1985
Felsenstein, J. (1985).
Confidence limits on phylogenies: An approach using the bootstrap.
Journal of Molecular Evolution, 39:783-791.

Feng and Doolittle, 1987
Feng, D. F. and Doolittle, R. F. (1987).
Progressive sequence alignment as a prerequisite to correct phylogenetic trees.
J Mol Evol, 25(4):351-360.

Forsberg et al., 2001
Forsberg, R., Oleksiewicz, M. B., Petersen, A. M., Hein, J., Bøtner, A., and Storgaard, T. (2001).
A molecular clock dates the common ancestor of European-type porcine reproductive and respiratory syndrome virus at more than 10 years before the emergence of disease.
Virology, 289(2):174-179.

Galperin and Koonin, 1998
Galperin, M. Y. and Koonin, E. V. (1998).
Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption.
In Silico Biol, 1(1):55-67.

Gentleman and Mullin, 1989
Gentleman, J. F. and Mullin, R. (1989).
The distribution of the frequency of occurrence of nucleotide subsequences, based on their overlap capability.
Biometrics, 45(1):35-52.

Gill and von Hippel, 1989
Gill, S. C. and von Hippel, P. H. (1989).
Calculation of protein extinction coefficients from amino acid sequence data.
Anal Biochem, 182(2):319-326.

Gonda et al., 1989
Gonda, D. K., Bachmair, A., Wünning, I., Tobias, J. W., Lane, W. S., and Varshavsky, A. (1989).
Universality and structure of the N-end rule.
J Biol Chem, 264(28):16700-16712.

Guindon and Gascuel, 2003
Guindon, S. and Gascuel, O. (2003).
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood.
Systematic Biology, 52(5):696-704.

Guo et al., 2006
Guo, L., Lobenhofer, E. K., Wang, C., Shippy, R., Harris, S. C., Zhang, L., Mei, N., Chen, T., Herman, D., Goodsaid, F. M., Hurban, P., Phillips, K. L., Xu, J., Deng, X., Sun, Y. A., Tong, W., Dragan, Y. P., and Shi, L. (2006).
Rat toxicogenomic study reveals analytical consistency across microarray platforms.
Nat Biotechnol, 24(9):1162-1169.

Han et al., 1999
Han, K., Kim, D., and Kim, H. (1999).
A vector-based method for drawing RNA secondary structure.
Bioinformatics, 15(4):286-297.

Hasegawa et al., 1985
Hasegawa, M., Kishino, H., and Yano, T. (1985).
Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.
Journal of Molecular Evolution, 22(2):160-174.

Hein, 2001
Hein, J. (2001).
An algorithm for statistical alignment of sequences related by a binary tree.
In Pacific Symposium on Biocomputing, page 179.

Hein et al., 2000
Hein, J., Wiuf, C., Knudsen, B., Møller, M. B., and Wibling, G. (2000).
Statistical alignment: computational properties, homology testing and goodness-of-fit.
J Mol Biol, 302(1):265-279.

Henikoff and Henikoff, 1992
Henikoff, S. and Henikoff, J. G. (1992).
Amino acid substitution matrices from protein blocks.
Proc Natl Acad Sci U S A, 89(22):10915-10919.

Höhl et al., 2007
Höhl, M., Rigoutsos, I., and Ragan, M. A. (2007).
Pattern-based phylogenetic distance estimation and tree reconstruction.
Evolutionary Bioinformatics, 2:0-0.

Hopp and Woods, 1983
Hopp, T. P. and Woods, K. R. (1983).
A computer program for predicting protein antigenic determinants.
Mol Immunol, 20(4):483-489.

Ikai, 1980
Ikai, A. (1980).
Thermostability and aliphatic index of globular proteins.
J Biochem (Tokyo), 88(6):1895-1898.

Janin, 1979
Janin, J. (1979).
Surface and inside volumes in globular proteins.
Nature, 277(5696):491-492.

Jones et al., 1992
Jones, D., Taylor, W., and Thornton, J. (1992).
The rapid generation of mutation data matrices from protein sequences.
Computer Applications in the Biosciences (CABIOS), 8:275-282.

Jukes and Cantor, 1969
Jukes, T. and Cantor, C. (1969).
Mammalian Protein Metabolism, chapter Evolution of protein molecules, pages 21-32.
New York: Academic Press.

Kal et al., 1999
Kal, A. J., van Zonneveld, A. J., Benes, V., van den Berg, M., Koerkamp, M. G., Albermann, K., Strack, N., Ruijter, J. M., Richter, A., Dujon, B., Ansorge, W., and Tabak, H. F. (1999).
Dynamics of gene expression revealed by comparison of serial analysis of gene expression transcript profiles from yeast grown on two different carbon sources.
Mol Biol Cell, 10(6):1859-1872.

Karplus and Schulz, 1985
Karplus, P. A. and Schulz, G. E. (1985).
Prediction of chain flexibility in proteins.
Naturwissenschaften, 72:212-213.

Kaufman and Rousseeuw, 1990
Kaufman, L. and Rousseeuw, P. (1990).
Finding groups in data. an introduction to cluster analysis.
Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics, New York: Wiley, 1990.

Kierzek et al., 1999
Kierzek, R., Burkard, M. E., and Turner, D. H. (1999).
Thermodynamics of single mismatches in RNA duplexes.
Biochemistry, 38(43):14214-14223.

Kimura, 1980
Kimura, M. (1980).
A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences.
J Mol Evol, 16(2):111-120.

Klee and Ellis, 2005
Klee, E. W. and Ellis, L. B. M. (2005).
Evaluating eukaryotic secreted protein prediction.
BMC Bioinformatics, 6:256.

Knudsen and Miyamoto, 2001
Knudsen, B. and Miyamoto, M. M. (2001).
A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins.
Proc Natl Acad Sci U S A, 98(25):14512-14517.

Kolaskar and Tongaonkar, 1990
Kolaskar, A. S. and Tongaonkar, P. C. (1990).
A semi-empirical method for prediction of antigenic determinants on protein antigens.
FEBS Lett, 276(1-2):172-174.

Krogh et al., 2001
Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E. L. (2001).
Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.
J Mol Biol, 305(3):567-580.

Kyte and Doolittle, 1982
Kyte, J. and Doolittle, R. F. (1982).
A simple method for displaying the hydropathic character of a protein.
J Mol Biol, 157(1):105-132.

Leitner and Albert, 1999
Leitner, T. and Albert, J. (1999).
The molecular clock of HIV-1 unveiled through analysis of a known transmission history.
Proc Natl Acad Sci U S A, 96(19):10752-10757.

Lloyd, 1982
Lloyd, S. (1982).
Least squares quantization in PCM.
Information Theory, IEEE Transactions on, 28(2):129-137.

Longfellow et al., 1990
Longfellow, C. E., Kierzek, R., and Turner, D. H. (1990).
Thermodynamic and spectroscopic study of bulge loops in oligoribonucleotides.
Biochemistry, 29(1):278-285.

Maizel and Lenk, 1981
Maizel, J. V. and Lenk, R. P. (1981).
Enhanced graphic matrix analysis of nucleic acid and protein sequences.
Proc Natl Acad Sci U S A, 78(12):7665-7669.

Mathews et al., 2004
Mathews, D. H., Disney, M. D., Childs, J. L., Schroeder, S. J., Zuker, M., and Turner, D. H. (2004).
Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of rna secondary structure.
Proc Natl Acad Sci U S A, 101(19):7287-7292.

Mathews et al., 1999
Mathews, D. H., Sabina, J., Zuker, M., and Turner, D. H. (1999).
Expanded sequence dependence of thermodynamic parameters improves prediction of rna secondary structure.
J Mol Biol, 288(5):911-940.

Mathews and Turner, 2002
Mathews, D. H. and Turner, D. H. (2002).
Experimentally derived nearest-neighbor parameters for the stability of RNA three- and four-way multibranch loops.
Biochemistry, 41(3):869-880.

Mathews and Turner, 2006
Mathews, D. H. and Turner, D. H. (2006).
Prediction of RNA secondary structure by free energy minimization.
Curr Opin Struct Biol, 16(3):270-278.

McCaskill, 1990
McCaskill, J. S. (1990).
The equilibrium partition function and base pair binding probabilities for RNA secondary structure.
Biopolymers, 29(6-7):1105-1119.

McGinnis and Madden, 2004
McGinnis, S. and Madden, T. L. (2004).
BLAST: at the core of a powerful and diverse set of sequence analysis tools.
Nucleic Acids Res, 32(Web Server issue):W20-W25.

Menne et al., 2000
Menne, K. M., Hermjakob, H., and Apweiler, R. (2000).
A comparison of signal sequence prediction methods using a test set of signal peptides.
Bioinformatics, 16(8):741-742.

Meyer et al., 2007
Meyer, M., Stenzel, U., Myles, S., Prüfer, K., and Hofreiter, M. (2007).
Targeted high-throughput sequencing of tagged nucleic acid samples.
Nucleic Acids Res, 35(15):e97.

Michener and Sokal, 1957
Michener, C. and Sokal, R. (1957).
A quantitative approach to a problem in classification.
Evolution, 11:130-162.

Nielsen et al., 1997
Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G. (1997).
Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.
Protein Eng, 10(1):1-6.

Purvis, 1995
Purvis, A. (1995).
A composite estimate of primate phylogeny.
Philos Trans R Soc Lond B Biol Sci, 348(1326):405-421.

Reinhardt and Hubbard, 1998
Reinhardt, A. and Hubbard, T. (1998).
Using neural networks for prediction of the subcellular location of proteins.
Nucleic Acids Res, 26(9):2230-2236.

Rivas and Eddy, 2000
Rivas, E. and Eddy, S. R. (2000).
Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs.
Bioinformatics, 16(7):583-605.

Robinson et al., 2010
Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2010).
edger: a bioconductor package for differential expression analysis of digital gene expression data.
Bioinformatics, 26(1):139-140.

Robinson and Smyth, 2007
Robinson, M. D. and Smyth, G. K. (2007).
Moderated statistical tests for assessing differences in tag abundance.
Bioinformatics, 23(21):2881-2887.

Robinson and Smyth, 2008
Robinson, M. D. and Smyth, G. K. (2008).
Small-sample estimation of negative binomial dispersion, with applications to sage data.
Biostatistics, 9(2):321-332.

Rose et al., 1985
Rose, G. D., Geselowitz, A. R., Lesser, G. J., Lee, R. H., and Zehfus, M. H. (1985).
Hydrophobicity of amino acid residues in globular proteins.
Science, 229(4716):834-838.

Rost, 2001
Rost, B. (2001).
Review: protein secondary structure prediction continues to rise.
J Struct Biol, 134(2-3):204-218.

Saitou and Nei, 1987
Saitou, N. and Nei, M. (1987).
The neighbor-joining method: a new method for reconstructing phylogenetic trees.
Mol Biol Evol, 4(4):406-425.

Sankoff et al., 1983
Sankoff, D., Kruskal, J., Mainville, S., and Cedergren, R. (1983).
Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, chapter Fast algorithms to determine RNA secondary structures containing multiple loops, pages 93-120.
Addison-Wesley, Reading, Ma.

SantaLucia, 1998
SantaLucia, J. (1998).
A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics.
Proc Natl Acad Sci U S A, 95(4):1460-1465.

Schechter and Berger, 1967
Schechter, I. and Berger, A. (1967).
On the size of the active site in proteases. I. Papain.
Biochem Biophys Res Commun, 27(2):157-162.

Schechter and Berger, 1968
Schechter, I. and Berger, A. (1968).
On the active site of proteases. 3. Mapping the active site of papain; specific peptide inhibitors of papain.
Biochem Biophys Res Commun, 32(5):898-902.

Schneider and Stephens, 1990
Schneider, T. D. and Stephens, R. M. (1990).
Sequence logos: a new way to display consensus sequences.
Nucleic Acids Res, 18(20):6097-6100.

Schroeder et al., 1999
Schroeder, S. J., Burkard, M. E., and Turner, D. H. (1999).
The energetics of small internal loops in RNA.
Biopolymers, 52(4):157-167.

Shapiro et al., 2007
Shapiro, B. A., Yingling, Y. G., Kasprzak, W., and Bindewald, E. (2007).
Bridging the gap in RNA structure prediction.
Curr Opin Struct Biol, 17(2):157-165.

Siepel and Haussler, 2004
Siepel, A. and Haussler, D. (2004).
Combining phylogenetic and hidden Markov models in biosequence analysis.
J Comput Biol, 11(2-3):413-428.

Smith and Waterman, 1981
Smith, T. F. and Waterman, M. S. (1981).
Identification of common molecular subsequences.
J Mol Biol, 147(1):195-197.

Sturges, 1926
Sturges, H. A. (1926).
The choice of a class interval.
Journal of the American Statistical Association, 21:65-66.

Tian et al., 2005
Tian, L., Greenberg, S., Kong, S., Altschuler, J., Kohane, I., and Park, P. (2005).
Discovering statistically significant pathways in expression profiling studies.
Proceedings of the National Academy of Sciences, 102(38):13544-13549.

Tobias et al., 1991
Tobias, J. W., Shrader, T. E., Rocap, G., and Varshavsky, A. (1991).
The N-end rule in bacteria.
Science, 254(5036):1374-1377.

Tusher et al., 2001
Tusher, V. G., Tibshirani, R., and Chu, G. (2001).
Significance analysis of microarrays applied to the ionizing radiation response.
Proc Natl Acad Sci U S A, 98(9):5116-5121.

von Ahsen et al., 2001
von Ahsen, N., Wittwer, C. T., and Schütz, E. (2001).
Oligonucleotide melting temperatures under PCR conditions: nearest-neighbor corrections for Mg(2+), deoxynucleotide triphosphate, and dimethyl sulfoxide concentrations with comparison to alternative empirical formulas.
Clin Chem, 47(11):1956-1961.

von Heijne, 1986
von Heijne, G. (1986).
A new method for predicting signal sequence cleavage sites.
Nucl. Acids Res., 14:4683-4690.

Welling et al., 1985
Welling, G. W., Weijer, W. J., van der Zee, R., and Welling-Wester, S. (1985).
Prediction of sequential antigenic regions in proteins.
FEBS Lett, 188(2):215-218.

Whelan and Goldman, 2001
Whelan, S. and Goldman, N. (2001).
A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach.
Molecular Biology and Evolution, 18:691-699.

Wootton and Federhen, 1993
Wootton, J. C. and Federhen, S. (1993).
Statistics of local complexity in amino acid sequences and sequence databases.
Computers in Chemistry, 17:149-163.

Workman and Krogh, 1999
Workman, C. and Krogh, A. (1999).
No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution.
Nucleic Acids Res, 27(24):4816-4822.

Yang, 1994a
Yang, Z. (1994a).
Estimating the pattern of nucleotide substitution.
Journal of Molecular Evolution, 39(1):105-111.

Yang, 1994b
Yang, Z. (1994b).
Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods.
Journal of Molecular Evolution, 39(3):306-314.

Zuker, 1989a
Zuker, M. (1989a).
On finding all suboptimal foldings of an rna molecule.
Science, 244(4900):48-52.

Zuker, 1989b
Zuker, M. (1989b).
The use of dynamic programming algorithms in rna secondary structure prediction.
Mathematical Methods for DNA Sequences, pages 159-184.

Zuker and Sankoff, 1984
Zuker, M. and Sankoff, D. (1984).
Rna secondary structures and their prediction.
Bulletin of Mathemetical Biology, 46:591-621.

Zuker and Stiegler, 1981
Zuker, M. and Stiegler, P. (1981).
Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information.
Nucleic Acids Res, 9(1):133-148.

#55535#>