References management

The References management tool (figure 8.1) offers an easy way of retrieving popular reference data sources such as genes, variant annotations and genome sequences as tracks.

Image datamanagementtool
Figure 8.1: Click on the References button and choose the QIAGEN Sets tab to find and download reference data; Custom Sets to customize reference data sets; and Imported Data to import your own reference data.

The total size of the reference data you can download can vary and is indicated when selecting the elements to download. The amount of time it will take to download this data depends on your network connection, but it can take several hours on slower connections.

Where reference data is downloaded from

Data download using the Download Genomes comes from public repositories such as Ensembl, NCBI, UCSC. This type of data is not provided or hosted by QIAGEN. The workbench only provides an easy way to retrieve data that should otherwise have been downloaded and imported. The list of organisms is dynamically updated by QIAGEN independently of Workbench versions, so you will always see the most recent list of organisms. If you do not find the organism you are looking for, there is always the possibility to download and import the data using the Import tools.

The QIAGEN Sets tab allows you to download curated Reference Data Sets directly from a QIAGEN reference data repository. The location of the repository can be changed in the Edit | Preferences | Advanced as explained in Advanced preferences. Unless you are in the special circumstance that your system administrator has decided to mirror this data locally and wishes you to use that mirror of the data, you should not change this setting.

Downloading data to a CLC Genomics Workbench

By default, data downloaded using the Reference Data Manager is stored in a folder in your home area called CLC_References. If such a folder does not already exist, it will be created and added as a Workbench data location automatically when you first start up the Workbench.

In the top right hand side, the option "Locally" next to "Manage Reference Data" tells you that data will be downloaded to the Workbench. The amount of free space available is reported just below this (figure 8.2).

Image reference_local
Figure 8.2: Reference data is downloaded to the CLC_References area of the Workbench when the "Manage Reference Data" option is set to "Locally".

Downloading data to a CLC Genomics Server

If you are logged into a CLC Genomics Server that has been configured with a file system location called CLC_References, then the "Manage Reference Data" drop-down menu in the top right corner of the Reference Data Manager will allow you to choose whether to download to the Workbench ("Locally"), or to CLC Genomics Server ("On Server").

If you have chosen "On Server" and your CLC Genomics Server is set up to send jobs to grid nodes, you will be able to choose which grid preset to use for downloading data under the Download Genomes tab via a drop-down menu to the left of the Download button (figure 8.3).

Image reference_server
Figure 8.3: Reference data will be downloaded to the CLC_References are on the CLC Genomics Server when the "On Server" option is chosen. Depending on your set up, you may be able to choose the grid to use with the drop down menu next to the Download button.

By default, data will be downloaded directly using the the CLC_References location on the server. Downloads can be configured to go via the Workbench as described in Advanced preferences. This can be useful if CLC Genomics Server does not have access to the external network but the Workbench does.

Changing the reference data location

You can specify a different location to download reference data to. This is recommended if you do not have enough space in the area the workbench designates as the reference data location by default. To change the reference data location from within the Navigation Area:

        Right-click on the folder "CLC_References" | Choose "Location" | Choose "Specify Reference Location"

The new folder will also be called CLC_References, but will be located where you specify.

In more detail, this action results in the following:

This action does not:

If you have previously downloaded data into the CLC_References folder with the old location, you will need to use standard system tools to delete this folder and/or its contents. If you would like to keep the reference data from the old location, you can move it, using standard system tools, into the new CLC_References folder that you just specified. This would save you needing to download it again.

Note! If you run out of space, and realize that the CLC_References should be stored somewhere else, you can do this by choosing a new location, then manually moving the already downloaded files to that new location, and restarting the workbench. The "downloaded references" file will then be updated with all the new references.

Reference data for non-networked systems

CLC Genomics Workbench may be installed on computers that have no access to the external network. In that case, please proceed with the following steps to import reference data to the non-networked workbench:

  1. Install CLC Genomics Workbench on a machine with access to the external network.
  2. Download an evaluation license via the Workbench License Manager. If you have problems obtaining an evaluation license this way, please write to us at ts-bioinformatics@qiagen.com.
  3. Use the Reference Data Manager on the networked Workbench to download the reference data of interest. By default, this would be downloaded to a folder called CLC_References.
  4. When the download is completed, copy the CLC_References folder and all its contents to a location where the machines with the CLC software installed can access it.
  5. Get the software to refer to that folder for reference data: in the Navigation Area of the non-networked Workbench, right click on the CLC_References, and choose the option "Specify Reference Location...". Choose the folder you imported from the networked Workbench and click Select.

You can then access reference data using the Reference Data manager.



Subsections