Download Curated Microbial Reference Database
The Download Curated Microbial Reference Database tool downloads selected reference databases as single sequence lists and/or taxonomic profiling indices with the necessary annotations required for the tools in the Typing and Epidemiology and Metagenomics sections of the Microbial Genomics Module.
To run the tool, go to:
Toolbox | Microbial Genomics Module () | Metagenomics () | Databases () | Taxonomic Analyses () | Download Curated Microbial Reference Database ()
In the first window (figure 17.1), select the database you wish to download.
Figure 17.1: Select the database and output format
You can choose between several databases
- QMI-PTDB - Approx. 22GB memory required: QIAGEN Microbial Insights - Prokaryotic taxonomy database is a microbial reference database for taxonomic profiling of bacteria and archaea. This database contains additional references not present in the smaller version. It is not suitable for running on a standard laptop.
- QMI-PTDB - Approx. 16GB memory required: QIAGEN Microbial Insights - Prokaryotic taxonomy database is a microbial reference database for taxonomic profiling of bacteria and archaea. This is a subset of the larger database suitable for running on a standard laptop.
- Unified Human Gastrointestinal Genome (UHGG): A database for taxonomic and functional profiling of human gut samples curated and hosted by EMBL-EBI[Almeida et al., 2021]. The database includes metagenome assembled genomes from human gut samples.
- Unclustered Reference Viral DataBase (U-RVDB): Unclustered Reference Viral Database for virus detection [Goodacre et al., 2018]. The database includes curated viral, virus-related and virus-like nucleotide sequences except bacterial viruses which are excluded.
- Clustered Reference Viral DataBase (C-RVDB) : Clustered Reference Viral Database for virus detection. Viral entries are clustered at 98% by CD-HIT-EST.
- ViraCuraTM HPV REF: A curated database of Human Papillomavirus reference strains. It contains unmodified viral reference genomes and associated record information from NCBI databases.
- ViraCuraTM HPV VAR: A curated database of Human Papillomavirus variants of reference strains. It contains unmodified viral reference genomes and associated record information from NCBI databases.
- ViraCuraTM ANIMAL PV: A curated database of Animal Papillomavirus. It contains unmodified viral reference genomes and associated record information from NCBI databases.
- MPXV: A curated database of Monkeypox virus reference strains. It contains unmodified viral reference genomes and associated record information from NCBI databases, as well as metadata and customized taxonomic nomenclature.
- MOCOVA: A curated database of Monkeypox outgroup reference strains (Molluscum contagiosum, Cowpox, Variola, and Vaccinia). It contains unmodified viral reference genomes and associated record information from NCBI databases, as well as metadata and customized taxonomic nomenclature.
You can then chose to download the database as an annotated sequence list and/or as a taxonomic profiling index.
The Curated Microbial Reference Databases are optimized for balance in the taxonomic representation across the taxonomy, i.e. the oversampling of some branches of the taxonomy is removed by using representative sequences. This has the consequence that some assemblies may not be particularly good assemblies, yet they are included as they constitute the best current representative of the given branch in the taxonomy. For this optimized database you can choose to download the 22g database, or one that is optimized for running the Taxonomic Profiling tool on a laptop computer with 16GB of main memory. The 16g version of the curated database contain a smaller number of assemblies, in order to be able to run on a system with 16GB of main memory.
Note: some of the databases offered are derived works, licensed under a Creative Commons Attribution-ShareAlike (CC BY-SA) license. We offer free access to those without requiring a CLC product license. They can be downloaded using the CLC Genomics Workbench with the Microbial Genomics Module installed in viewing mode. The downloaded files can then be exported to non-proprietary formats using the freely available viewing mode of the CLC Genomics Workbench.
Subsections