Storing, managing and moving reference data

Reference data downloaded using the Reference Data Manager is stored in a CLC File Location called CLC_References. Data can also be added to this location via launch wizards for workflows with input elements configured to use workflow roles (see Reference Data Sets and defining Custom Sets). Data cannot be added by moving files within the Navigation Area.

Data in CLC_References locations can only be deleted using functionality in the Reference Data Manager.

This section describes general functionality relating to managing data downloaded using the Reference Data Manager, including working with a CLC Server and how to get reference data onto non-networked machines.

Downloading reference data locally

Data downloaded locally using the Reference Data Manager (figure 11.20) is stored in a CLC File Location called CLC_References. By default, this location is configured to refer to a folder of the same name in your home area. If that folder does not already exist when the CLC Genomics Workbench is first started up, it is created and added as a CLC File Location.

Image rdm-datadownload-locally
Figure 11.20: When reference data is stored locally, "Locally" is shown in the top right side of the Reference Data Manager, along with information about how much space is available.

You can see the underlying folder that this location is mapped to by hovering the mouse cursor over the location in the Navigation Area (figure 11.21).

Image ref_data_loc_mouseover_gwb
Figure 11.21: Hover the mouse cursor over a CLC_References File Location to see the folder it is mapped to on the file system. By default, this is a folder in your home area (top). When connected to a CLC Server with a CLC_References Location, the tooltip states that the location is on the server (bottom).

Specifying a different folder for reference data

The folder on the file system where reference data should be stored, i.e. the folder that your local CLC_References File Location is mapped to, can be configured by right-clicking on the CLC_References location and choosing Location | Specify Reference Location (figure 11.22). This can be useful when larger amounts of space are needed or when sharing the reference data folder with others.

Image specify_new_ref_data_loc_gwb
Figure 11.22: To map the local CLC_References location to a different folder on the file system, right-click on CLC_References in the Navigation Area and select Location | Specify Reference Location.

Updating where the CLC_References File Location is mapped to does not remove the old CLC_References folder on the file system or its contents. Standard system tools should be used to delete these items if they are no longer needed.

Working with reference data on a CLC Server

When the CLC Genomics Workbench is connected to CLC Genomics Server configured with a File System Location called CLC_References, the option "On Server" can be selected in the Manage Reference Data drop-down list (figure 11.23).

When the "On Server" option is selected, the information shown in the Reference Data Manager refers to data stored in the CLC_References File System Location on the CLC Server, and data downloaded is downloaded to that location. By default, data is downloaded directly to the CLC Server, but downloads can be configured to go via the CLC Genomics Workbench instead using a setting in the Workbench Preferences. This can be useful if the CLC Server does not have access to the external network but the CLC Genomics Workbench does. See Advanced preferences.

Image rdm-datadownload-onserver
Figure 11.23: When reference data is on the CLC Server the Workbench is connected to, the "On Server" option can be selected in the Manage Reference Data drop down list at the top, right side of the Reference Data Manager.

Copying reference data

Reference data can be easily copied from a CLC_References location in a CLC Workbench to a CLC_References location on a CLC Server or vice versa.

A button labeled Copy from WB will be visible when the selected data is available in the CLC_References area of your Workbench and you have selected the "On Server" option in the Reference Data Manager. Clicking on this button copies the data from the Workbench to the CLC Server CLC_References location.

Conversely, if you are working with a CLC_References location on your Workbench (i.e. the "Locally" option is selected in the Reference Data Manager) and you are connected to a CLC Server with a CLC_References location configured, a button labeled Copy from server will be present. Clicking on this copies the data to your Workbench CLC_References location from the CLC Server CLC_References location.

Copying data from other locations into a CLC_References location is described in Imported Data. Copying data from a CLC_References location to elsewhere on the file system is described in Exporting reference data outside of the Reference Data Manager framework.

Reference data on non-networked systems

If the CLC Genomics Workbench is installed on systems without access to the external network, the following steps can be followed to import reference data to the non-networked Workbench:

  1. Install CLC Genomics Workbench on a machine with access to the external network.
  2. Download an evaluation license via the Workbench License Manager. If you have problems obtaining an evaluation license this way, please write to us at ts-bioinformatics@qiagen.com.
  3. Use the Reference Data Manager on the networked Workbench to download the reference data of interest. By default, this would be downloaded to a folder called CLC_References.
  4. When the download is completed, copy the CLC_References folder and all its contents to a location where the machines with the CLC software installed can access it.
  5. Get the software to refer to that folder for reference data: in the Navigation Area of the non-networked Workbench, right click on the CLC_References, and choose the option "Specify Reference Location...". Choose the folder you imported from the networked Workbench and click Select.

You can then access reference data using the Reference Data Manager.



Subsections