Download and configure reference data
The first time you open Biomedical Genomics Workbench you will be presented with the dialog box shown in figure 12.2, which informs you that data are available for download either to the local or server CLC_References repository. If you check the "Never show this dialog again" then subsequently you will only be presented with the dialog box when updated versions of the reference data are available.
Figure 12.2: Notification that new versions of the reference data are available.
Click on the button labeled Yes. This will take you to the wizard shown in figure 12.3.
Figure 12.3: The Manage Reference Data wizard gives access to the reference data that are required to be able to run the ready-to-use workflows.
This wizard can also be accessed from the upper right corner of the Biomedical Genomics Workbench by clicking on Data Management () (figure 12.4).
Figure 12.4: Click on the button labeled "Data Management" to open the "Manage Reference Data" dialog where you can download and configure the reference data that are necessary to be able to run the ready-to-use-workflows.
The "Manage Reference Data" wizard gives access to all the reference data that are used in the ready-to-use workflows and in the tutorials. From the wizard you can download and configure the reference data.
In the upper part of the wizard you can find two tiles called "QIAGEN Reference Data Library" () and "Custom Reference Data Sets" ().
On the left hand side, you can use the drop-down menu to choose where you want to manage the reference data. If you choose "Locally", the Download, Delete and Apply buttons will work on the local reference data. If you choose "On Server" (only available if you are connected to the server), the buttons will work on the reference data on the server you are connected to(figure 12.5).
Figure 12.5: Reference data can be available locally or on the server.
You can also check how much free space is available for the Reference folder on your local disk or on the server. The drop-down menu also allows you to check which datasets have been downloaded locally or on the server. You can see this in the left panel of the reference data manager.
When on the "QIAGEN Reference Data Library" tile, we can see the list of all available references data under 6 headers: Reference Data Sets and Reference Data Elements, Tutorial Reference Data Sets and Tutorial Reference Data Elements, and Previous Reference Data Sets and Previous Reference Data Elements. Two icons indicate whether you have already downloaded your data in your Reference folder () or not ().
When selecting a reference set or an element, the window on the right show the size of the folder as well as some complementary information about the reference database. For Reference Data Sets, a table recapitulates the elements included in the set with their version number and respective size, as well as a list of the workflows affected by the set.
The Reference Data Sets available include hg19, hg38 (both an Ensembl and a RefSeq version), RefSeq, Mouse and Rat, a data set designed for QIAGEN Gene Reads Panels hg19, and two data sets for use with the GATK plugin.
We also offer access to Tutorial Reference Data Sets that are chromosome-specific and ready to us with some of our tutorials (http://www.clcbio.com/support/tutorials/).
The Previous Reference Data Sets folder contains older versions of the Reference Data Sets that have been replaced with newer one in the Reference Data Sets folder.
Each Reference Data Set is made of a compilation of Reference Data Elements. Downloading sets will automatically download the elements the set is made of, but you can also download elements individually under the Reference Data Elements folder.
Data that has not been downloaded yet is represented by a plus icon (). Select the set or element you would like to download, and click on the Download button. Once the data is downloading, the Download button fades out and you can check the progress of the downloading in the Processes tab below the toolbox (figure 12.6).
Figure 12.6: Click on the info button to see the legal notice and license information.
Once the reference data has been downloaded, the set or element is marked with a check icon ().
If you have finished downloading the appropriate Reference Data Set, click on the button labeled Apply and the workflows will automatically be configured with all the relevant reference data available. The information in the "Applied" column in the right panel of the reference data manager describes whether the dataset has been applied to the location specified in the drop-down menu. For example, a "Yes" in the "Applied" column when the drop-down menu is set to "On Server" means that the given data will be used from the server, when the affected workflows are run. This will be the case even if you choose execute the workflow locally (i.e. in the workbench). If the "Applied" column contains "Yes" when the drop-down menu is set to "Locally", this means the given data will be used from the local reference folder, when the affected workflows are run. This means that you will not be able to execute these workflows on the server (figure 12.7).
Figure 12.7: Check where your reference data is applied by looking at the column "Applied" in the data set description.
The Reference Data Sets also contain a Create Custom Set ... button that allows you to create your own set of reference data starting from an existing data set (see Create a custom Reference Data Set).
The Delete button allows user to delete locally installed reference data, whereas only administrators are capable of deleting reference data installed on the server. This can be used if you suspect that a downloaded reference is corrupt, and needs to be re-downloaded, or if you need to clean up space, e.g. locally.
At the bottom of the wizard you can find:
- A button "Help" button that links to the section in the Biomedical Genomics Workbench reference manual that describes the "Manage Reference Data" button.
- A button labeled "Close". Click on this to close the wizard.