Bibliography

Allison et al., 2006
Allison, D., Cui, X., Page, G., and Sabripour, M. (2006).
Microarray data analysis: from disarray to consolidation and consensus.
NATURE REVIEWS GENETICS, 7(1):55.

Altschul and Gish, 1996
Altschul, S. F. and Gish, W. (1996).
Local alignment statistics.
Methods Enzymol, 266:460-480.

Altschul et al., 1990
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990).
Basic local alignment search tool.
J Mol Biol, 215(3):403-410.

Andrade et al., 1998
Andrade, M. A., O'Donoghue, S. I., and Rost, B. (1998).
Adaptation of protein surfaces to subcellular location.
J Mol Biol, 276(2):517-525.

Ashburner et al., 2000
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000).
Gene ontology: tool for the unification of biology.
Nat Genet, 25(1):25-29.

Auer and Doerge, 2010
Auer, P. L. and Doerge, R. (2010).
Statistical design and analysis of rna sequencing data.
Genetics, 185(2):405-416.

Bachmair et al., 1986
Bachmair, A., Finley, D., and Varshavsky, A. (1986).
In vivo half-life of a protein is a function of its amino-terminal residue.
Science, 234(4773):179-186.

Baggerly et al., 2003
Baggerly, K., Deng, L., Morris, J., and Aldaz, C. (2003).
Differential expression in SAGE: accounting for normal between-library variation.
Bioinformatics, 19(12):1477-1483.

Bateman et al., 2004
Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E. L., Studholme, D. J., Yeats, C., and Eddy, S. R. (2004).
The Pfam protein families database.
Nucleic Acids Res., 32(Database issue):D138-D141.

Benjamini and Hochberg, 1995
Benjamini, Y. and Hochberg, Y. (1995).
Controlling the false discovery rate: a practical and powerful approach to multiple testing.
JOURNAL-ROYAL STATISTICAL SOCIETY SERIES B, 57:289-289.

Berman et al., 2003
Berman, H., Henrick, K., and Nakamura, H. (2003).
Announcing the worldwide protein data bank.
Nat Struct Biol, 10(12):980.

Bishop and Friday, 1985
Bishop, M. J. and Friday, A. E. (1985).
Evolutionary trees from nucleic acid and protein sequences.
Proceeding of the Royal Society of London, B 226:271-302.

Blaisdell, 1989
Blaisdell, B. E. (1989).
Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a computer-generated model system.
J Mol Evol, 29(6):538-47.

Bolstad et al., 2003
Bolstad, B., Irizarry, R., Astrand, M., and Speed, T. (2003).
A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.
Bioinformatics, 19(2):185-193.

Bommarito et al., 2000
Bommarito, S., Peyret, N., and SantaLucia, J. (2000).
Thermodynamic parameters for DNA sequences with dangling ends.
Nucleic Acids Res, 28(9):1929-1934.

Chen et al., 2004
Chen, G., Znosko, B. M., Jiao, X., and Turner, D. H. (2004).
Factors affecting thermodynamic stabilities of RNA 3 x 3 internal loops.
Biochemistry, 43(40):12865-12876.

Clote et al., 2005
Clote, P., Ferré, F., Kranakis, E., and Krizanc, D. (2005).
Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency.
RNA, 11(5):578-591.

Cornette et al., 1987
Cornette, J. L., Cease, K. B., Margalit, H., Spouge, J. L., Berzofsky, J. A., and DeLisi, C. (1987).
Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins.
J Mol Biol, 195(3):659-685.

Costa, 2007
Costa, F. F. (2007).
Non-coding RNAs: lost in translation?
Gene, 386(1-2):1-10.

Creighton et al., 2009
Creighton, C. J., Reid, J. G., and Gunaratne, P. H. (2009).
Expression profiling of micrornas by deep sequencing.
Brief Bioinform, 10(5):490-497.

Cronn et al., 2008
Cronn, R., Liston, A., Parks, M., Gernandt, D. S., Shen, R., and Mockler, T. (2008).
Multiplex sequencing of plant chloroplast genomes using solexa sequencing-by-synthesis technology.
Nucleic Acids Res, 36(19):e122.

Crooks et al., 2004
Crooks, G. E., Hon, G., Chandonia, J.-M., and Brenner, S. E. (2004).
WebLogo: a sequence logo generator.
Genome Res, 14(6):1188-1190.

Dayhoff and Schwartz, 1978
Dayhoff, M. O. and Schwartz, R. M. (1978).
Atlas of Protein Sequence and Structure, volume 3 of 5 suppl., pages 353-358.
Nat. Biomed. Res. Found., Washington D.C.

Dayhoff et al., 1978
Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. (1978).
A model of evolutionary change in protein.
Atlas of Protein Sequence and Structure, 5(3):345-352.

Dempster et al., 1977
Dempster, A., Laird, N., Rubin, D., et al. (1977).
Maximum likelihood from incomplete data via the EM algorithm.
Journal of the Royal Statistical Society, 39(1):1-38.

Dudoit et al., 2003
Dudoit, S., Shaffer, J., and Boldrick, J. (2003).
Multiple Hypothesis Testing in Microarray Experiments.
STATISTICAL SCIENCE, 18(1):71-103.

Eddy, 2004
Eddy, S. R. (2004).
Where did the BLOSUM62 alignment score matrix come from?
Nat Biotechnol, 22(8):1035-1036.

Edgar, 2004
Edgar, R. C. (2004).
Muscle: a multiple sequence alignment method with reduced time and space complexity.
BMC Bioinformatics, 5:113.

Efron, 1982
Efron, B. (1982).
The jackknife, the bootstrap and other resampling plans, volume 38.
SIAM.

Eisen et al., 1998
Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998).
Cluster analysis and display of genome-wide expression patterns.
Proceedings of the National Academy of Sciences, 95(25):14863-14868.

Eisenberg et al., 1984
Eisenberg, D., Schwarz, E., Komaromy, M., and Wall, R. (1984).
Analysis of membrane and surface protein sequences with the hydrophobic moment plot.
J Mol Biol, 179(1):125-142.

Emini et al., 1985
Emini, E. A., Hughes, J. V., Perlow, D. S., and Boger, J. (1985).
Induction of hepatitis a virus-neutralizing antibody by a virus-specific synthetic peptide.
J Virol, 55(3):836-839.

Engelman et al., 1986
Engelman, D. M., Steitz, T. A., and Goldman, A. (1986).
Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins.
Annu Rev Biophys Biophys Chem, 15:321-353.

Falcon and Gentleman, 2007
Falcon, S. and Gentleman, R. (2007).
Using GOstats to test gene lists for GO term association.
Bioinformatics, 23(2):257.

Felsenstein, 1981
Felsenstein, J. (1981).
Evolutionary trees from DNA sequences: a maximum likelihood approach.
J Mol Evol, 17(6):368-376.

Felsenstein, 1985
Felsenstein, J. (1985).
Confidence limits on phylogenies: An approach using the bootstrap.
Journal of Molecular Evolution, 39:783-791.

Feng and Doolittle, 1987
Feng, D. F. and Doolittle, R. F. (1987).
Progressive sequence alignment as a prerequisite to correct phylogenetic trees.
J Mol Evol, 25(4):351-360.

Forsberg et al., 2001
Forsberg, R., Oleksiewicz, M. B., Petersen, A. M., Hein, J., Bøtner, A., and Storgaard, T. (2001).
A molecular clock dates the common ancestor of European-type porcine reproductive and respiratory syndrome virus at more than 10 years before the emergence of disease.
Virology, 289(2):174-179.

Galperin and Koonin, 1998
Galperin, M. Y. and Koonin, E. V. (1998).
Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption.
In Silico Biol, 1(1):55-67.

Gentleman and Mullin, 1989
Gentleman, J. F. and Mullin, R. (1989).
The distribution of the frequency of occurrence of nucleotide subsequences, based on their overlap capability.
Biometrics, 45(1):35-52.

Gill and von Hippel, 1989
Gill, S. C. and von Hippel, P. H. (1989).
Calculation of protein extinction coefficients from amino acid sequence data.
Anal Biochem, 182(2):319-326.

Gnerre et al., 2011
Gnerre, S., Maccallum, I., Przybylski, D., Ribeiro, F. J., Burton, J. N., Walker, B. J., Sharpe, T., Hall, G., Shea, T. P., Sykes, S., Berlin, A. M., Aird, D., Costello, M., Daza, R., Williams, L., Nicol, R., Gnirke, A., Nusbaum, C., Lander, E. S., and Jaffe, D. B. (2011).
High-quality draft assemblies of mammalian genomes from massively parallel sequence data.
Proceedings of the National Academy of Sciences of the United States of America, 108(4):1513-8.

Gonda et al., 1989
Gonda, D. K., Bachmair, A., Wünning, I., Tobias, J. W., Lane, W. S., and Varshavsky, A. (1989).
Universality and structure of the N-end rule.
J Biol Chem, 264(28):16700-16712.

Guindon and Gascuel, 2003
Guindon, S. and Gascuel, O. (2003).
A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood.
Systematic Biology, 52(5):696-704.

Guo et al., 2006
Guo, L., Lobenhofer, E. K., Wang, C., Shippy, R., Harris, S. C., Zhang, L., Mei, N., Chen, T., Herman, D., Goodsaid, F. M., Hurban, P., Phillips, K. L., Xu, J., Deng, X., Sun, Y. A., Tong, W., Dragan, Y. P., and Shi, L. (2006).
Rat toxicogenomic study reveals analytical consistency across microarray platforms.
Nat Biotechnol, 24(9):1162-1169.

Han et al., 1999
Han, K., Kim, D., and Kim, H. (1999).
A vector-based method for drawing RNA secondary structure.
Bioinformatics, 15(4):286-297.

Hasegawa et al., 1985
Hasegawa, M., Kishino, H., and Yano, T. (1985).
Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.
Journal of Molecular Evolution, 22(2):160-174.

Heinz et al., 2010
Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y. C., Laslo, P., Cheng, J. X., Murre, C., Singh, H., and Glass, C. K. (2010).
Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities.
Mol cell, 38(4):576-589.

Henikoff and Henikoff, 1992
Henikoff, S. and Henikoff, J. G. (1992).
Amino acid substitution matrices from protein blocks.
Proc Natl Acad Sci U S A, 89(22):10915-10919.

Heydarian et al., 2014
Heydarian, M., Romeo Luperchio, T., Cutler, J., Mitchell, C., Kim, M.-S., Pandey, A., Soliner-Webb, B., and Reddy, K. (2014).
Prediction of gene activity in early B cell development based on an integrative multi-omics analysis.
J Proteomics Bioinform, 7(2):050-063.

Höhl et al., 2007
Höhl, M., Rigoutsos, I., and Ragan, M. A. (2007).
Pattern-based phylogenetic distance estimation and tree reconstruction.
Evolutionary Bioinformatics, 2:0-0.

Homer N, 2010
Homer N, N. S. (2010).
Improved variant discovery through local re-alignment of short-read next-generation sequencing data using srma.
Genome Biol., 11(10):R99.

Hopp and Woods, 1983
Hopp, T. P. and Woods, K. R. (1983).
A computer program for predicting protein antigenic determinants.
Mol Immunol, 20(4):483-489.

Ikai, 1980
Ikai, A. (1980).
Thermostability and aliphatic index of globular proteins.
J Biochem (Tokyo), 88(6):1895-1898.

Janin, 1979
Janin, J. (1979).
Surface and inside volumes in globular proteins.
Nature, 277(5696):491-492.

Jones et al., 1992
Jones, D., Taylor, W., and Thornton, J. (1992).
The rapid generation of mutation data matrices from protein sequences.
Computer Applications in the Biosciences (CABIOS), 8:275-282.

Jukes and Cantor, 1969
Jukes, T. and Cantor, C. (1969).
Mammalian Protein Metabolism, chapter Evolution of protein molecules, pages 21-32.
New York: Academic Press.

Kal et al., 1999
Kal, A. J., van Zonneveld, A. J., Benes, V., van den Berg, M., Koerkamp, M. G., Albermann, K., Strack, N., Ruijter, J. M., Richter, A., Dujon, B., Ansorge, W., and Tabak, H. F. (1999).
Dynamics of gene expression revealed by comparison of serial analysis of gene expression transcript profiles from yeast grown on two different carbon sources.
Mol Biol Cell, 10(6):1859-1872.

Karplus and Schulz, 1985
Karplus, P. A. and Schulz, G. E. (1985).
Prediction of chain flexibility in proteins.
Naturwissenschaften, 72:212-213.

Kaufman and Rousseeuw, 1990
Kaufman, L. and Rousseeuw, P. (1990).
Finding groups in data. an introduction to cluster analysis.
Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics, New York: Wiley, 1990.

Kelly et al., 2012
Kelly, T. K., Liu, Y., Lay, F. D., Liang, G., Berman, B. P., and Jones, P. A. (2012).
Genome-wide mapping of nucleosome positioning and DNA methylation within individual DNA molecules.
Genome Res., 22(12):2497-2506.

Kierzek et al., 1999
Kierzek, R., Burkard, M. E., and Turner, D. H. (1999).
Thermodynamics of single mismatches in RNA duplexes.
Biochemistry, 38(43):14214-14223.

Kimura, 1980
Kimura, M. (1980).
A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences.
J Mol Evol, 16(2):111-120.

Knudsen and Miyamoto, 2001
Knudsen, B. and Miyamoto, M. M. (2001).
A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins.
Proc Natl Acad Sci U S A, 98(25):14512-14517.

Knudsen and Miyamoto, 2003
Knudsen, B. and Miyamoto, M. M. (2003).
Sequence alignments and pair hidden markov models using evolutionary history.
Journal of Molecular Biology, 333(2):453 - 460.

Kolaskar and Tongaonkar, 1990
Kolaskar, A. S. and Tongaonkar, P. C. (1990).
A semi-empirical method for prediction of antigenic determinants on protein antigens.
FEBS Lett, 276(1-2):172-174.

Kumar et al., 2013
Kumar, V., Muratani, M., Rayan, N. A., Kraus, P., Lufkin, T., Ng, H. H., and Prabhakar, S. (2013).
Uniform, optimal signal processing of mapped deep-sequencing data.
Nat Biotechnol, 31(7):615-22.

Kyte and Doolittle, 1982
Kyte, J. and Doolittle, R. F. (1982).
A simple method for displaying the hydropathic character of a protein.
J Mol Biol, 157(1):105-132.

Landt et al., 2012
Landt, S. G., Marinov, G. K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B. E., Bickel, P., Brown, J. B., Cayting, P., Chen, Y., DeSalvo, G., Epstein, C., Fisher-Aylor, K. I., Euskirchen, G., Gerstein, M., Gertz, J., Hartemink, A. J., Hoffman, M. M., Iyer, V. R., Jung, Y. L., Karmakar, S., Kellis, M., Kharchenko, P. V., Li, Q., Liu, T., Liu, X. S., Ma, L., Milosavljevic, A., Myers, R. M., Park, P. J., Pazin, M. J., Perry, M. D., Raha, D., Reddy, T. E., Rozowsky, J., Shoresh, N., Sidow, A., Slattery, M., Stamatoyannopoulos, J. A., Tolstorukov, M. Y., White, K. P., Xi, S., Farnham, P. J., Lieb, J. D., Wold, B. J., and Snyder, M. (2012).
ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.
Genome Res, 22(9):1813-31.

Law et al., 2014
Law, V., Knox, C., Djoumbou, Y., Jewison, T., Guo, A., Liu, Y., Maciejewski, A., Arndt, D., Wilson, M., Neveu, V., Tang, A., Gabriel, G., Ly, C., Adamjee, S., Dame, Z., Han, B., Zhou, Y., and Wishart, D. (2014).
Drugbank 4.0: shedding new light on drug metabolism.
Nucleic Acids Res., 42:D1091-7.

Leitner and Albert, 1999
Leitner, T. and Albert, J. (1999).
The molecular clock of HIV-1 unveiled through analysis of a known transmission history.
Proc Natl Acad Sci U S A, 96(19):10752-10757.

Li et al., 2007
Li, B., Carey, M., and Workman, J. L. (2007).
The role of chromatin during transcription.
Cell, 128(4):707-719.

Li et al., 2012
Li, J., Lupat, R., Amarasinghe, K. C., Thompson, E. R., Doyle, M. A., Ryland, G. L., Tothill, R. W., Halgamuge, S. K., Campbell, I. G., and Gorringe, K. L. (2012).
Contra: copy number analysis for targeted resequencing.
Bioinformatics, 28(10):1307-1313.

Li et al., 2010
Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., Li, S., Yang, H., Wang, J., and Wang, J. (2010).
De novo assembly of human genomes with massively parallel short read sequencing.
Genome research, 20(2):265-72.

Lloyd, 1982
Lloyd, S. (1982).
Least squares quantization in PCM.
Information Theory, IEEE Transactions on, 28(2):129-137.

Longfellow et al., 1990
Longfellow, C. E., Kierzek, R., and Turner, D. H. (1990).
Thermodynamic and spectroscopic study of bulge loops in oligoribonucleotides.
Biochemistry, 29(1):278-285.

Lu et al., 2008
Lu, M., Dousis, A. D., and Ma, J. (2008).
Opus-rota: A fast and accurate method for side-chain modeling.
Protein Science, 17(9):1576-1585.

Maizel and Lenk, 1981
Maizel, J. V. and Lenk, R. P. (1981).
Enhanced graphic matrix analysis of nucleic acid and protein sequences.
Proc Natl Acad Sci U S A, 78(12):7665-7669.

Marinov et al., 2014
Marinov, G. K., Kundaje, A., Park, P. J., and Wold, B. J. (2014).
Large-scale quality analysis of published ChIP-seq data.
G3 (Bethesda), 4(2):209-23.

Mathews et al., 2004
Mathews, D. H., Disney, M. D., Childs, J. L., Schroeder, S. J., Zuker, M., and Turner, D. H. (2004).
Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of rna secondary structure.
Proc Natl Acad Sci U S A, 101(19):7287-7292.

Mathews et al., 1999
Mathews, D. H., Sabina, J., Zuker, M., and Turner, D. H. (1999).
Expanded sequence dependence of thermodynamic parameters improves prediction of rna secondary structure.
J Mol Biol, 288(5):911-940.

Mathews and Turner, 2002
Mathews, D. H. and Turner, D. H. (2002).
Experimentally derived nearest-neighbor parameters for the stability of RNA three- and four-way multibranch loops.
Biochemistry, 41(3):869-880.

Mathews and Turner, 2006
Mathews, D. H. and Turner, D. H. (2006).
Prediction of RNA secondary structure by free energy minimization.
Curr Opin Struct Biol, 16(3):270-278.

McCarthy et al., 2012
McCarthy, D. J., Chen, Y., and Smyth, G. K. (2012).
Differential expression analysis of multifactor rna-seq experiments with respect to biological variation.
Nucleic Acids Research, 10:4288-4297.

McCaskill, 1990
McCaskill, J. S. (1990).
The equilibrium partition function and base pair binding probabilities for RNA secondary structure.
Biopolymers, 29(6-7):1105-1119.

McGinnis and Madden, 2004
McGinnis, S. and Madden, T. L. (2004).
BLAST: at the core of a powerful and diverse set of sequence analysis tools.
Nucleic Acids Res, 32(Web Server issue):W20-W25.

Meyer et al., 2007
Meyer, M., Stenzel, U., Myles, S., Pruefer, K., and Hofreiter, M. (2007).
Targeted high-throughput sequencing of tagged nucleic acid samples.
Nucleic Acids Res, 35(15):e97.

Miao et al., 2011
Miao, Z., Cao, Y., and Jiang, T. (2011).
Rasp: rapid modeling of protein side chain conformations.
Bioinformatics, 27(22):3117-3122.

Michener and Sokal, 1957
Michener, C. and Sokal, R. (1957).
A quantitative approach to a problem in classification.
Evolution, 11:130-162.

Morin et al., 2008
Morin, R. D., O'Connor, M. D., Griffith, M., Kuchenbauer, F., Delaney, A., Prabhu, A.-L., Zhao, Y., McDonald, H., Zeng, T., Hirst, M., Eaves, C. J., and Marra, M. A. (2008).
Application of massively parallel sequencing to microrna profiling and discovery in human embryonic stem cells.
Genome Res, 18(4):610-621.

Morrison, 1968
Morrison, D. R. (1968).
Patricia - practical algorithm to retrieve information coded in alphanumeric.
J. ACM, 15(4):514-534.

Mortazavi et al., 2008
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., and Wold, B. (2008).
Mapping and quantifying mammalian transcriptomes by rna-seq.
Nat Methods, 5(7):621-628.

Mukherjee and Zhang, 2009
Mukherjee, S. and Zhang, Y. (2009).
MM-align: A quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming.
Nucleic Acids Res., 37.

Niu and Zhang, 2012
Niu, Y. S. and Zhang, H. (2012).
The screening and ranking algorithm to detect dna copy number variations.
Ann Appl Stat, 6(3):1306-1326.

Parkhomchuk et al., 2009
Parkhomchuk, D., Borodina, T., Amstislavskiy, V., Banaru, M., Hallen, L., Krobitsch, S., Lehrach, H., and Soldatov, A. (2009).
Transcriptome analysis by strand-specific sequencing of complementary dna.
Nucleic Acids Res, 37(18):e123.

Purvis, 1995
Purvis, A. (1995).
A composite estimate of primate phylogeny.
Philos Trans R Soc Lond B Biol Sci, 348(1326):405-421.

Rivas and Eddy, 2000
Rivas, E. and Eddy, S. R. (2000).
Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs.
Bioinformatics, 16(7):583-605.

Robinson et al., 2010
Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2010).
edger: a bioconductor package for differential expression analysis of digital gene expression data.
Bioinformatics, 26(1):139-140.

Robinson and Oshlack, 2010
Robinson, M. D. and Oshlack, A. (2010).
A scaling normalization method for differential expression analysis of RNA-seq data.
Genome Biol., 11(3):R25.

Robinson and Smyth, 2007
Robinson, M. D. and Smyth, G. K. (2007).
Moderated statistical tests for assessing differences in tag abundance.
Bioinformatics, 23(21):2881-2887.

Robinson and Smyth, 2008
Robinson, M. D. and Smyth, G. K. (2008).
Small-sample estimation of negative binomial dispersion, with applications to sage data.
Biostatistics, 9(2):321-332.

Rose et al., 1985
Rose, G. D., Geselowitz, A. R., Lesser, G. J., Lee, R. H., and Zehfus, M. H. (1985).
Hydrophobicity of amino acid residues in globular proteins.
Science, 229(4716):834-838.

Rost, 2001
Rost, B. (2001).
Review: protein secondary structure prediction continues to rise.
J Struct Biol, 134(2-3):204-218.

Rye et al., 2011
Rye, M. B., Saetrom, P., and Drablos, F. (2011).
A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs.
Nucleic Acids Res, 39(4):e25.

Saitou and Nei, 1987
Saitou, N. and Nei, M. (1987).
The neighbor-joining method: a new method for reconstructing phylogenetic trees.
Mol Biol Evol, 4(4):406-425.

Sankoff et al., 1983
Sankoff, D., Kruskal, J., Mainville, S., and Cedergren, R. (1983).
Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, chapter Fast algorithms to determine RNA secondary structures containing multiple loops, pages 93-120.
Addison-Wesley, Reading, Ma.

SantaLucia, 1998
SantaLucia, J. (1998).
A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics.
Proc Natl Acad Sci U S A, 95(4):1460-1465.

Schechter and Berger, 1967
Schechter, I. and Berger, A. (1967).
On the size of the active site in proteases. I. Papain.
Biochem Biophys Res Commun, 27(2):157-162.

Schechter and Berger, 1968
Schechter, I. and Berger, A. (1968).
On the active site of proteases. 3. Mapping the active site of papain; specific peptide inhibitors of papain.
Biochem Biophys Res Commun, 32(5):898-902.

Schneider and Stephens, 1990
Schneider, T. D. and Stephens, R. M. (1990).
Sequence logos: a new way to display consensus sequences.
Nucleic Acids Res, 18(20):6097-6100.

Schroeder et al., 1999
Schroeder, S. J., Burkard, M. E., and Turner, D. H. (1999).
The energetics of small internal loops in RNA.
Biopolymers, 52(4):157-167.

Shapiro et al., 2007
Shapiro, B. A., Yingling, Y. G., Kasprzak, W., and Bindewald, E. (2007).
Bridging the gap in RNA structure prediction.
Curr Opin Struct Biol, 17(2):157-165.

Siepel and Haussler, 2004
Siepel, A. and Haussler, D. (2004).
Combining phylogenetic and hidden Markov models in biosequence analysis.
J Comput Biol, 11(2-3):413-428.

Smith and Waterman, 1981
Smith, T. F. and Waterman, M. S. (1981).
Identification of common molecular subsequences.
J Mol Biol, 147(1):195-197.

Stanton et al., 2013
Stanton, K. P., Parisi, F., Strino, F., Rabin, N., Asp, P., and Kluger, Y. (2013).
Arpeggio: harmonic compression of ChIP-seq data reveals protein-chromatin interaction signatures.
Nucleic Acids Res, 41(16):e161.

Stark et al., 2010
Stark, M. S., Tyagi, S., Nancarrow, D. J., Boyle, G. M., Cook, A. L., Whiteman, D. C., Parsons, P. G., Schmidt, C., Sturm, R. A., and Hayward, N. K. (2010).
Characterization of the melanoma mirnaome by deep sequencing.
PLoS One, 5(3):e9685.

Sturges, 1926
Sturges, H. A. (1926).
The choice of a class interval.
Journal of the American Statistical Association, 21:65-66.

The Gene Ontology Consortium, 2019
The Gene Ontology Consortium (2019).
Gene ontology resource: 20 years and still going strong.
Nucleic Acids Research, 47(D1):D330-D338.

Tian et al., 2005
Tian, L., Greenberg, S., Kong, S., Altschuler, J., Kohane, I., and Park, P. (2005).
Discovering statistically significant pathways in expression profiling studies.
Proceedings of the National Academy of Sciences, 102(38):13544-13549.

Tobias et al., 1991
Tobias, J. W., Shrader, T. E., Rocap, G., and Varshavsky, A. (1991).
The N-end rule in bacteria.
Science, 254(5036):1374-1377.

Tusher et al., 2001
Tusher, V. G., Tibshirani, R., and Chu, G. (2001).
Significance analysis of microarrays applied to the ionizing radiation response.
Proc Natl Acad Sci U S A, 98(9):5116-5121.

Vandesompele et al., 2002
Vandesompele, J., Preter, K. D., Pattyn, F., Poppe, B., Roy, N. V., Paepe, A. D., and Speleman, F. (2002).
Accurate normalization of real-time quantitative rt-pcr data by geometric averaging of multiple internal control genes.
Genome Biol.

von Ahsen et al., 2001
von Ahsen, N., Wittwer, C. T., and Schütz, E. (2001).
Oligonucleotide melting temperatures under PCR conditions: nearest-neighbor corrections for Mg(2+), deoxynucleotide triphosphate, and dimethyl sulfoxide concentrations with comparison to alternative empirical formulas.
Clin Chem, 47(11):1956-1961.

Welling et al., 1985
Welling, G. W., Weijer, W. J., van der Zee, R., and Welling-Wester, S. (1985).
Prediction of sequential antigenic regions in proteins.
FEBS Lett, 188(2):215-218.

Whelan and Goldman, 2001
Whelan, S. and Goldman, N. (2001).
A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach.
Molecular Biology and Evolution, 18:691-699.

Wishart et al., 2006
Wishart, D., Knox, C., Guo, A., Shrivastava, S., Hassanali, M., Stothard, P., Chang, Z., and Woolsey, J. (2006).
Drugbank: a comprehensive resource for in silico drug discovery and exploration.
Nucleic Acids Res., 34:D668-72.

Wootton and Federhen, 1993
Wootton, J. C. and Federhen, S. (1993).
Statistics of local complexity in amino acid sequences and sequence databases.
Computers in Chemistry, 17:149-163.

Workman and Krogh, 1999
Workman, C. and Krogh, A. (1999).
No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution.
Nucleic Acids Res, 27(24):4816-4822.

Wyman et al., 2009
Wyman, S. K., Parkin, R. K., Mitchell, P. S., Fritz, B. R., O'Briant, K., Godwin, A. K., Urban, N., Drescher, C. W., Knudsen, B. S., and Tewari, M. (2009).
Repertoire of micrornas in epithelial ovarian cancer as determined by next generation sequencing of small rna cdna libraries.
PLoS One, 4(4):e5311.

Xu and Zhang, 2010
Xu, J. and Zhang, Y. (2010).
How significant is a protein structure similarity with TM-score = 0.5?
Bioinformatics, 26(7):889-95.

Yang, 1994a
Yang, Z. (1994a).
Estimating the pattern of nucleotide substitution.
Journal of Molecular Evolution, 39(1):105-111.

Yang, 1994b
Yang, Z. (1994b).
Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods.
Journal of Molecular Evolution, 39(3):306-314.

Zerbino and Birney, 2008
Zerbino, D. R. and Birney, E. (2008).
Velvet: algorithms for de novo short read assembly using de Bruijn graphs.
Genome Res, 18(5):821-829.

Zerbino et al., 2009
Zerbino, D. R., McEwen, G. K., Margulies, E. H., and Birney, E. (2009).
Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler.
PloS one, 4(12):e8407.

Zhang and Skolnick, 2004
Zhang, Y. and Skolnick, J. (2004).
Scoring function for automated assessment of protein structure template quality.
Proteins, 57(4):702-10.

Zuker, 1989a
Zuker, M. (1989a).
On finding all suboptimal foldings of an rna molecule.
Science, 244(4900):48-52.

Zuker, 1989b
Zuker, M. (1989b).
The use of dynamic programming algorithms in rna secondary structure prediction.
Mathematical Methods for DNA Sequences, pages 159-184.

Zuker and Sankoff, 1984
Zuker, M. and Sankoff, D. (1984).
Rna secondary structures and their prediction.
Bulletin of Mathemetical Biology, 46:591-621.

Zuker and Stiegler, 1981
Zuker, M. and Stiegler, P. (1981).
Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information.
Nucleic Acids Res, 9(1):133-148.