et al., 2012
 , . G. P. C., Abecasis, G. R., Auton, A., Brooks, L. D., DePristo, M. A., Durbin, R. M., Handsaker, R. E., Kang, H. M., Marth, G. T., and McVean, G. A. (2012).
An integrated map of genetic variation from 1,092 human genomes.
Nature, 491(7422):56-65.

Allison et al., 2006
Allison, D., Cui, X., Page, G., and Sabripour, M. (2006).
Microarray data analysis: from disarray to consolidation and consensus.

Altshuler et al., 2000
Altshuler, D., Pollara, V. J., Cowles, C. R., Etten, W. J. V., Baldwin, J., Linton, L., and Lander, E. S. (2000).
An snp map of the human genome generated by reduced representation shotgun sequencing.
Nature, 407(6803):513-516.

Baggerly et al., 2003
Baggerly, K., Deng, L., Morris, J., and Aldaz, C. (2003).
Differential expression in SAGE: accounting for normal between-library variation.
Bioinformatics, 19(12):1477-1483.

Bamford et al., 2004
Bamford, S., Dawson, E., Forbes, S., Clements, J., Pettett, R., Dogan, A., Flanagan, A., Teague, J., Futreal, P. A., Stratton, M. R., and Wooster, R. (2004).
The cosmic (catalogue of somatic mutations in cancer) database and website.
Br J Cancer, 91(2):355-358.

Benjamini and Hochberg, 1995
Benjamini, Y. and Hochberg, Y. (1995).
Controlling the false discovery rate: a practical and powerful approach to multiple testing.

Berman et al., 2003
Berman, H., Henrick, K., and Nakamura, H. (2003).
Announcing the worldwide protein data bank.
Nat Struct Biol, 10(12):980.

Bolstad et al., 2003
Bolstad, B., Irizarry, R., Astrand, M., and Speed, T. (2003).
A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.
Bioinformatics, 19(2):185-193.

Bommarito et al., 2000
Bommarito, S., Peyret, N., and SantaLucia, J. (2000).
Thermodynamic parameters for DNA sequences with dangling ends.
Nucleic Acids Res, 28(9):1929-1934.

Brockman et al., 2008
Brockman, W., Alvarez, P., Young, S., Garber, M., Giannoukos, G., Lee, W. L., Russ, C., Lander, E. S., Nusbaum, C., and Jaffe, D. B. (2008).
Quality scores and snp detection in sequencing-by-synthesis systems.
Genome Res, 18(5):763-770.

Choi et al., 2009
Choi, M., Scholl, U. I., Ji, W., Liu, T., Tikhonova, I. R., Zumbo, P., Nayir, A., Bakkaloglu, A., Özen, S., Sanjad, S., Nelson-Williams, C., Farhi, A., Mane, S., and Lifton, R. P. (2009).
Genetic diagnosis by whole exome capture and massively parallel DNA sequencing.
Proc Natl Acad Sci U S A, 106(45):19096-19101.

Cornette et al., 1987
Cornette, J. L., Cease, K. B., Margalit, H., Spouge, J. L., Berzofsky, J. A., and DeLisi, C. (1987).
Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins.
J Mol Biol, 195(3):659-685.

Creighton et al., 2009
Creighton, C. J., Reid, J. G., and Gunaratne, P. H. (2009).
Expression profiling of micrornas by deep sequencing.
Brief Bioinform, 10(5):490-497.

Cronn et al., 2008
Cronn, R., Liston, A., Parks, M., Gernandt, D. S., Shen, R., and Mockler, T. (2008).
Multiplex sequencing of plant chloroplast genomes using solexa sequencing-by-synthesis technology.
Nucleic Acids Res, 36(19):e122.

Dudoit et al., 2003
Dudoit, S., Shaffer, J., and Boldrick, J. (2003).
Multiple Hypothesis Testing in Microarray Experiments.

Eisen et al., 1998
Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998).
Cluster analysis and display of genome-wide expression patterns.
Proceedings of the National Academy of Sciences, 95(25):14863-14868.

Eisenberg et al., 1984
Eisenberg, D., Schwarz, E., Komaromy, M., and Wall, R. (1984).
Analysis of membrane and surface protein sequences with the hydrophobic moment plot.
J Mol Biol, 179(1):125-142.

Emini et al., 1985
Emini, E. A., Hughes, J. V., Perlow, D. S., and Boger, J. (1985).
Induction of hepatitis a virus-neutralizing antibody by a virus-specific synthetic peptide.
J Virol, 55(3):836-839.

Engelman et al., 1986
Engelman, D. M., Steitz, T. A., and Goldman, A. (1986).
Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins.
Annu Rev Biophys Biophys Chem, 15:321-353.

Falcon and Gentleman, 2007
Falcon, S. and Gentleman, R. (2007).
Using GOstats to test gene lists for GO term association.
Bioinformatics, 23(2):257.

Guo et al., 2006
Guo, L., Lobenhofer, E. K., Wang, C., Shippy, R., Harris, S. C., Zhang, L., Mei, N., Chen, T., Herman, D., Goodsaid, F. M., Hurban, P., Phillips, K. L., Xu, J., Deng, X., Sun, Y. A., Tong, W., Dragan, Y. P., and Shi, L. (2006).
Rat toxicogenomic study reveals analytical consistency across microarray platforms.
Nat Biotechnol, 24(9):1162-1169.

Heap et al., 2010
Heap, G. A., Yang, J. H. M., Downes, K., Healy, B. C., Hunt, K. A., Bockett, N., Franke, L., Dubois, P. C., Mein, C. A., Dobson, R. J., Albert, T. J., Rodesch, M. J., Clayton, D. G., Todd, J. A., van Heel, D. A., and Plagnol, V. (2010).
Genome-wide analysis of allelic expression imbalance in human primary cells by high-throughput transcriptome resequencing.
Hum Mol Genet, 19(1):122-134.

Heydarian et al., 2014
Heydarian, M., Romeo Luperchio, T., Cutler, J., Mitchell, C., Kim, M.-S., Pandey, A., Soliner-Webb, B., and Reddy, K. (2014).
Prediction of gene activity in early B cell development based on an integrative multi-omics analysis.
J Proteomics Bioinform, 7(2):050-063.

Homer N, 2010
Homer N, N. S. (2010).
Improved variant discovery through local re-alignment of short-read next-generation sequencing data using srma.
Genome Biol., 11(10):R99.

Hopp and Woods, 1983
Hopp, T. P. and Woods, K. R. (1983).
A computer program for predicting protein antigenic determinants.
Mol Immunol, 20(4):483-489.

Janin, 1979
Janin, J. (1979).
Surface and inside volumes in globular proteins.
Nature, 277(5696):491-492.

Kal et al., 1999
Kal, A. J., van Zonneveld, A. J., Benes, V., van den Berg, M., Koerkamp, M. G., Albermann, K., Strack, N., Ruijter, J. M., Richter, A., Dujon, B., Ansorge, W., and Tabak, H. F. (1999).
Dynamics of gene expression revealed by comparison of serial analysis of gene expression transcript profiles from yeast grown on two different carbon sources.
Mol Biol Cell, 10(6):1859-1872.

Karplus and Schulz, 1985
Karplus, P. A. and Schulz, G. E. (1985).
Prediction of chain flexibility in proteins.
Naturwissenschaften, 72:212-213.

Kaufman and Rousseeuw, 1990
Kaufman, L. and Rousseeuw, P. (1990).
Finding groups in data. an introduction to cluster analysis.
Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics, New York: Wiley, 1990.

Knudsen and Miyamoto, 2003
Knudsen, B. and Miyamoto, M. M. (2003).
Sequence alignments and pair hidden markov models using evolutionary history.
Journal of Molecular Biology, 333(2):453 - 460.

Kolaskar and Tongaonkar, 1990
Kolaskar, A. S. and Tongaonkar, P. C. (1990).
A semi-empirical method for prediction of antigenic determinants on protein antigens.
FEBS Lett, 276(1-2):172-174.

Kumar et al., 2013
Kumar, V., Muratani, M., Rayan, N. A., Kraus, P., Lufkin, T., Ng, H. H., and Prabhakar, S. (2013).
Uniform, optimal signal processing of mapped deep-sequencing data.
Nat Biotechnol, 31(7):615-22.

Kyte and Doolittle, 1982
Kyte, J. and Doolittle, R. F. (1982).
A simple method for displaying the hydropathic character of a protein.
J Mol Biol, 157(1):105-132.

Landt et al., 2012
Landt, S. G., Marinov, G. K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B. E., Bickel, P., Brown, J. B., Cayting, P., Chen, Y., DeSalvo, G., Epstein, C., Fisher-Aylor, K. I., Euskirchen, G., Gerstein, M., Gertz, J., Hartemink, A. J., Hoffman, M. M., Iyer, V. R., Jung, Y. L., Karmakar, S., Kellis, M., Kharchenko, P. V., Li, Q., Liu, T., Liu, X. S., Ma, L., Milosavljevic, A., Myers, R. M., Park, P. J., Pazin, M. J., Perry, M. D., Raha, D., Reddy, T. E., Rozowsky, J., Shoresh, N., Sidow, A., Slattery, M., Stamatoyannopoulos, J. A., Tolstorukov, M. Y., White, K. P., Xi, S., Farnham, P. J., Lieb, J. D., Wold, B. J., and Snyder, M. (2012).
ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.
Genome Res, 22(9):1813-31.

Lloyd, 1982
Lloyd, S. (1982).
Least squares quantization in PCM.
Information Theory, IEEE Transactions on, 28(2):129-137.

Lu et al., 2008
Lu, M., Dousis, A. D., and Ma, J. (2008).
Opus-rota: A fast and accurate method for side-chain modeling.
Protein Science, 17(9):1576-1585.

Marinov et al., 2014
Marinov, G. K., Kundaje, A., Park, P. J., and Wold, B. J. (2014).
Large-scale quality analysis of published ChIP-seq data.
G3 (Bethesda), 4(2):209-23.

Martin and Wang, 2011
Martin, J. A. and Wang, Z. (2011).
Next-generation transcriptome assembly.
Nat Rev Genet, 12(10):671-682.

Meyer et al., 2007
Meyer, M., Stenzel, U., Myles, S., Prï¿12fer, K., and Hofreiter, M. (2007).
Targeted high-throughput sequencing of tagged nucleic acid samples.
Nucleic Acids Res, 35(15):e97.

Miao et al., 2011
Miao, Z., Cao, Y., and Jiang, T. (2011).
Rasp: rapid modeling of protein side chain conformations.
Bioinformatics, 27(22):3117-3122.

Morin et al., 2008
Morin, R. D., O'Connor, M. D., Griffith, M., Kuchenbauer, F., Delaney, A., Prabhu, A.-L., Zhao, Y., McDonald, H., Zeng, T., Hirst, M., Eaves, C. J., and Marra, M. A. (2008).
Application of massively parallel sequencing to microrna profiling and discovery in human embryonic stem cells.
Genome Res, 18(4):610-621.

Morrison, 1968
Morrison, D. R. (1968).
Patricia - practical algorithm to retrieve information coded in alphanumeric.
J. ACM, 15(4):514-534.

Mortazavi et al., 2008
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., and Wold, B. (2008).
Mapping and quantifying mammalian transcriptomes by rna-seq.
Nat Methods, 5(7):621-628.

Mukherjee and Zhang, 2009
Mukherjee, S. and Zhang, Y. (2009).
MM-align: A quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming.
Nucleic Acids Res., 37.

Ng et al., 2009
Ng, S. B., Turner, E. H., Robertson, P. D., Flygare, S. D., Bigham, A. W., Lee, C., Shaffer, T., Wong, M., Bhattacharjee, A., Eichler, E. E., Bamshad, M., Nickerson, D. A., and Shendure, J. (2009).
Targeted capture and massively parallel sequencing of 12 human exomes.
Nature, 461(7261):272-276.

Nguyen et al., 2011
Nguyen, P., Ma, J., Pei, D., Obert, C., Cheng, C., and Geiger, T. (2011).
Identification of errors introduced during high throughput sequencing of the t cell receptor repertoire.
BMC genomics, 12(1):106.

Parkhomchuk et al., 2009
Parkhomchuk, D., Borodina, T., Amstislavskiy, V., Banaru, M., Hallen, L., Krobitsch, S., Lehrach, H., and Soldatov, A. (2009).
Transcriptome analysis by strand-specific sequencing of complementary dna.
Nucleic Acids Res, 37(18):e123.

Robinson et al., 2010
Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2010).
edger: a bioconductor package for differential expression analysis of digital gene expression data.
Bioinformatics, 26(1):139-140.

Robinson and Smyth, 2007
Robinson, M. D. and Smyth, G. K. (2007).
Moderated statistical tests for assessing differences in tag abundance.
Bioinformatics, 23(21):2881-2887.

Robinson and Smyth, 2008
Robinson, M. D. and Smyth, G. K. (2008).
Small-sample estimation of negative binomial dispersion, with applications to sage data.
Biostatistics, 9(2):321-332.

Rose et al., 1985
Rose, G. D., Geselowitz, A. R., Lesser, G. J., Lee, R. H., and Zehfus, M. H. (1985).
Hydrophobicity of amino acid residues in globular proteins.
Science, 229(4716):834-838.

Rye et al., 2011
Rye, M. B., Sætrom, P., and Drabløs, F. (2011).
A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs.
Nucleic Acids Res, 39(4):e25.

SantaLucia, 1998
SantaLucia, J. (1998).
A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics.
Proc Natl Acad Sci U S A, 95(4):1460-1465.

Smith and Waterman, 1981
Smith, T. F. and Waterman, M. S. (1981).
Identification of common molecular subsequences.
J Mol Biol, 147(1):195-197.

Stanton et al., 2013
Stanton, K. P., Parisi, F., Strino, F., Rabin, N., Asp, P., and Kluger, Y. (2013).
Arpeggio: harmonic compression of ChIP-seq data reveals protein-chromatin interaction signatures.
Nucleic Acids Res, 41(16):e161.

Stark et al., 2010
Stark, M. S., Tyagi, S., Nancarrow, D. J., Boyle, G. M., Cook, A. L., Whiteman, D. C., Parsons, P. G., Schmidt, C., Sturm, R. A., and Hayward, N. K. (2010).
Characterization of the melanoma mirnaome by deep sequencing.
PLoS One, 5(3):e9685.

Sturges, 1926
Sturges, H. A. (1926).
The choice of a class interval.
Journal of the American Statistical Association, 21:65-66.

Tian et al., 2005
Tian, L., Greenberg, S., Kong, S., Altschuler, J., Kohane, I., and Park, P. (2005).
Discovering statistically significant pathways in expression profiling studies.
Proceedings of the National Academy of Sciences, 102(38):13544-13549.

Tusher et al., 2001
Tusher, V. G., Tibshirani, R., and Chu, G. (2001).
Significance analysis of microarrays applied to the ionizing radiation response.
Proc Natl Acad Sci U S A, 98(9):5116-5121.

von Ahsen et al., 2001
von Ahsen, N., Wittwer, C. T., and Schï¿12tz, E. (2001).
Oligonucleotide melting temperatures under PCR conditions: nearest-neighbor corrections for Mg(2+), deoxynucleotide triphosphate, and dimethyl sulfoxide concentrations with comparison to alternative empirical formulas.
Clin Chem, 47(11):1956-1961.

Wang et al., 2009
Wang, Z., Gerstein, M., and Snyder, M. (2009).
RNA-Seq: a revolutionary tool for transcriptomics.
Nat Rev Genet, 10(1):57-63.

Welling et al., 1985
Welling, G. W., Weijer, W. J., van der Zee, R., and Welling-Wester, S. (1985).
Prediction of sequential antigenic regions in proteins.
FEBS Lett, 188(2):215-218.

Wyman et al., 2009
Wyman, S. K., Parkin, R. K., Mitchell, P. S., Fritz, B. R., O'Briant, K., Godwin, A. K., Urban, N., Drescher, C. W., Knudsen, B. S., and Tewari, M. (2009).
Repertoire of micrornas in epithelial ovarian cancer as determined by next generation sequencing of small rna cdna libraries.
PLoS One, 4(4):e5311.

Xu and Zhang, 2010
Xu, J. and Zhang, Y. (2010).
How significant is a protein structure similarity with TM-score = 0.5?
Bioinformatics, 26(7):889-95.

Zhang and Skolnick, 2004
Zhang, Y. and Skolnick, J. (2004).
Scoring function for automated assessment of protein structure template quality.
Proteins, 57(4):702-10.