Nucleotide sequence databases
- nr. All GenBank + EMBL + DDBJ + PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences). No longer "non-redundant" due to computational cost.
- Human G+T. Human genomic and transcript sequences
- Mouse G+T. Mouse genomic and transcript sequences
- refseq_rna. mRNA sequences from NCBI Reference Sequence Project.
- refseq_genomic. Genomic sequences from NCBI Reference Sequence Project.
- refseq_representative_genomes. Representative sequences from NCBI Reference Sequence Project.
- est. Database of GenBank + EMBL + DDBJ sequences from EST division.
- est_human. Human subset of est.
- est_mouse. Mouse subset of est.
- est_others. Subset of est other than human or mouse.
- gss. Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences.
- htgs. Unfinished High Throughput Genomic Sequences: phases 0, 1 and 2. Finished, phase 3 HTG sequences are in nr.
- pat. Nucleotides from the Patent division of GenBank.
- pdb. Sequences derived from the 3-dimensional structure records from Protein Data Bank. They are NOT the coding sequences for the corresponding proteins found in the same PDB record.
- alu. Select Alu repeats from REPBASE, suitable for masking Alu repeats from query sequences. See "Alu alert" by Claverie and Makalowski, Nature 371: 752 (1994).
- dbsts. Database of Sequence Tag Site entries from the STS division of GenBank + EMBL + DDBJ.
- chromosome. Complete genomes and complete chromosomes from the NCBI Reference Sequence project. It overlaps with refseq_genomic.
- env_nt. Sequences from environmental samples, such as uncultured bacterial samples isolated from soil or marine samples. The largest single source is Sagarsso Sea project. This does overlap with nucleotide nr.
- tsa_nt. Transcriptome Shotgun Assembly database.
- prokaryotic_16S_ribosomal_RNA. 16S ribomsal RNA sequences.
- Betacoronavirus. NCBI database of betacoronavirus sequences.