The ready-to-use workflows rely on the presence of particular reference datasets. This reference data must be downloaded and configured before these workflows can be used. The Data Management tool (figure 13.1) in the workbench make it easy to download the necessary data such that the workflows can find and use it.
This section covers the download and configurations needed to make available the reference data relevant to the Biomedical Genomics Workbench, including the human, mouse and rat genomes, annotations and variants made available by a variety of databases. The total size of the reference data set you can download varies among the data set, and is indicated in the top right corner of the data set window (see the red highlight in figure 13.1). the size of the individuals files of the data set in indicated in the table below. The amount of time it will take to download this data depends on your network connection, but it can take several hours on slower connections.
Reference data is provided by QIAGEN and the workbench is configured to download from QIAGEN by default. The location to download the data from can be seen in Edit | Preferences | Advanced as shown in figure 13.2.
Figure 13.2: The location where reference data is downloaded from can be seen in the Workbench Preferences. Generally this should not be altered except in the special case that the data from QIAGEN is being mirrored locally.
Unless you are in the special circumstance that your system administrator has decided to mirror this data locally and wishes you to use that mirror of the data, you should not change this setting.
The reference data that is downloaded will be stored in a folder called CLC_References. When the Biomedical Genomics Workbench is installed, such a folder is created on your file system under your home area. This folder is specified within the workbench as a reference location.
You can specify a different location to download reference data to. This is recommended if you do not have enough space in the area the workbench designates as the reference data location by default. To change the reference data location from within the Navigation Area:
Right-click on the folder "CLC_References" | Choose "Location" | Choose "Specify Reference Location"
The new folder will also be called CLC_References, but will be located where you specify.
In more detail, this action results in the following:
- A folder called CLC_References is created in the location you specified, if a folder of this name did not already exist.
- The workbench sets this new location as the place to download reference data to and the place the ready-to-use workflows should look for reference data.
This action does not:
- Remove the old CLC_References folder.
- Remove the contents of the old CLC_References folder, such as previously downloaded data.
If you have previously downloaded data into the CLC_References folder with the old location, you will need to use standard system tools to delete this folder and/or its contents. If you would like to keep the reference data from the old location, you can move it, using standard system tools, into the new CLC_References folder that you just specified. This would save you needing to download it again.
Note! If you run out of space, and realize that the CLC_References should be stored somewhere else, you can do this by choosing a new location, then manually moving the already downloaded files to that new location, and restarting the workbench. The "downloaded references" file will then be updated with all the new references.
- Download and configure reference data
- Create a custom Reference Data Set
- Exporting reference data for use in external applications
- Troubleshooting reference data downloads