Database Builder

If Skip Database Builder was not enabled, the required metadata for building a selection table will be downloaded. No assembly data is downloaded at this point. This process will take a brief moment.

The tool will open a table called a Database builder (figure 18.5) from which you can further customize your own database. A series of functionality can help you filter and sort the table to extract the information relevant to your project.

Image database_builder
Figure 18.5: Search, filter and select assemblies to download

  1. Use the "Quick selection" button to quickly select predefined subsets for download:
    • Single scaffold complete genomes in RefSeq
    • Complete genomes in RefSeq
    • All complete genomes
    Each reference in the table will be labeled with one of the statuses listed here. In addition, some references are marked as representative genomes for a clade (repr) or as reference genomes (refr). We include references that are marked as Complete genome, Chromosome, representative genome and/or reference genome in these subsets.
  2. Aggregate the table to a specified taxonomic group using the drop down menu in the "Data" palette of the side panel. Use the category "Name" to de-aggregate the table.
  3. Use filter(s) and select row by dragging or pressing Ctrl+A to keep only the rows you are interested in, and click on the button Include to stage the selected references for downloading, which is indicated by a checkmark in the "Included" column.

  4. Alternatively, press Ctrl+A then click on Include to include all rows first. Then set one or several filters, and use the button Exclude on the remaining rows. Clear the filter(s) by clicking on the red buttons next to each filter set. The rows not filtered away in the second step should still be checked.

  5. Use the Reset button to reset the selection. This restores the builder to the initial state with only pre-selected reference included. If no references were pre-selected this will exclude all.

Once the table has all the desired references included, which is indicated by a checkmark, click Download selection. Close to the button, you can check how many references are selected and see an estimate of the total size of the selection.

The dialog shown in figure 18.6 allows you to include all annotation tracks (annotation tracks are not needed for taxonomic profiling applications, but required when creating MLST schemes). An additional filter, Minimum contig length, may also be specified (this option is not available when downloading the curated database). It also warns about the memory and disk requirements that will be needed to later run the Taxonomic Profiling tool with the database you are about to download.

Image database_builder_download
Figure 18.6: Filter options for download of the selected references

Once a database has been downloaded, it is possible to extract a subset following the instructions described in 18.1.1