QIAGEN Sets
QIAGEN provides access to much common reference data using functionality under the QIAGEN Sets tab of the Reference Data Manager. Data is distributed as Reference Data Elements, which can be individually downloaded, or downloaded as part of a Reference Data Set. Many template workflows are configured to make use of QIAGEN Reference Sets, making them simple to launch while helping ensure that the same reference data is used consistently. Due to the way these workflows are configured, the relevant reference data can also be downloaded via the workflow launch wizard (figure 2.2).
Figure 2.2: When launching template workflows requiring reference data inputs, the relevant reference data can be downloaded via the workflow launch wizard. If you are logged into a CLC Server with a CLC_References location defined, you can choose whether to download the data to the Workbench or Server.
Using the Reference Data Manager for QIAGEN reference data
To access QIAGEN Sets, open the Reference Data Manager by clicking on the References () button in the top Toolbar or go to the Utilities menu and select Manage Reference Data (). Then click on the QIAGEN Sets tab at the top left. Under this tab, there are subsections for Reference Data Sets and Reference Data Elements (figure 2.3).
Figure 2.3: Subheadings under the QIAGEN Sets tab provide access to Reference Data Sets and Reference Data Elements
When a Reference Data Set is selected, information about it is displayed in the right hand pane. This includes the size of the whole data set, and a table listing the workflow roles defined in the set, with information about the data element specified for each role. Further details about the element assigned to a role can be found by clicking on the link in the Version column. An icon to the left of each set indicates whether data for this set has already been downloaded () or not (). The same icons are used to indicate the status of each element in a Reference Data Set (figure 2.4).
If you have permission to delete downloaded data, the Delete button will be enabled. When reference data is stored on a CLC Server, you need be logged in from the Workbench as an administrative user to delete reference data.
Figure 2.4: The elements in a Reference Data Set are being downloaded. The full size of the data set is shown at the top, right hand side. The size of each element is reported in the "On Disk Size" column. Below the row of tabs at the top is a search field that can be used to search for data sets or elements.
Searching for data available under the QIAGEN Sets tab
Use the search field under the top toolbar to search for terms in element and set names, workflow role names, and versions. To search for just an exact term, put the term in quotes.
The results include the name of the element or set the term was found in, followed in brackets by the tab it is listed under, e.g. (Reference Data Elements), (Tutorial Reference Data Sets), etc. Hover the cursor over a hit to see what aspect of the result matched the search term (figure 2.5). Double-click on a search result to open it.
Figure 2.5: Terms entered in the search field when the QIAGEN Sets tab is selected are searched for in element and set names, workflow role names, and versions of the resources available under that tab. Hovering the cursor over a hit reveals a tooltip with information about the match.
Downloading resources
To download a Reference Data Element or a Reference Data Set (i.e. all elements in that set), select it and click on the Download button.
The progress of the download is indicated and you have the option to Cancel, Pause or Resume the download (figure 2.4).
When the "Manage Reference Data" option at the top of the Reference Data Manager is set to "Locally", data is downloaded to the CLC_References location in the CLC Workbench. When set to "On Server", the data is downloaded to the CLC_References location in the CLC Server.
Additional information
The HapMap (https://www.sanger.ac.uk/data/hapmap-3/) databases contain more than one file. QIAGEN Reference Data Sets that include HapMap are initially configured with all the populations available. You can specify specific populations to use when launching a workflow, or you can create a custom reference set that contains only the populations of interest.
General information about Reference Data Sets, and creating Custom Sets, can be found at https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Reference_Data_Sets_defining_Custom_Sets.html.
General information about the Reference Data Manager is at https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=References_management.html.