Depending on your internet connection, it takes a few seconds to download the content and open the Database Builder (figure 15.5).
Assemblies that match the criteria from the Download Custom Microbial Reference Database tool will be pre-selected, indicated by a "Yes" in the Included column.
The Database Builder table contains additional columns with metadata based on information from GenBank, http:///www.ncbi.nlm.nih.gov/genbank/.html. Use the Database Builder functionality described below to customize and define the reference set to be downloaded.
Use the filtering options located at the top right to filter the table. For information on how to use the simple and advanced table filters, see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Filtering_tables.html.
From the Side Panel on the right, the following option is available:
- Aggregate rows on taxonomy. Aggregates results by the selected taxonomic level, e.g. Order.
Below the table you find buttons for quick selection, including or excluding rows, and download of selected reference subset:
- Quick selection. For selection of one of the following predefined subsets, based on information in the Assembly Status, Chromosomal scaffolds, and In RefSeq columns:
- Single scaffold complete genomes in RefSeq. Complete genomes with Chromosomal scaffolds= 1; In RefSeq= Yes.
- Complete genomes in RefSeq. Complete genomes with In RefSeq= Yes.
- All complete genomes. Any Complete genome.
- Include and Exclude. Includes or excludes the selected rows from the subset selection.
- Reset selection. Reset selection to match criteria specified in Download Custom Microbial Reference Database wizard.
- Download selection. For download of the selected reference subset. Brings up a dialog with the following options (figure 15.6):
- Include all annotation tracks. Will include CDS, gene, etc. annotations in the downloaded database. The annotations are not needed for taxonomic profiling, but may be required for other applications such as creating MLST schemes.
- Minimum contig length. The minimum length of sequences to be included in the database.
The dialog provides an estimate of the memory and disk requirements needed to later run the Taxonomic Profiling tool with the database you are about to download.