Reference Data Sets and defining Custom Sets
General information about Reference Data Sets
A Reference Data Set is a collection of related reference data elements. By configuring a workflow to make use of a Reference Data Set, a set of connected reference data can be specified in a single step when launching the workflow, instead of selecting each data element individually.
The connection between the data available in a Reference Data Set and the data needed for a workflow run is made by matching up roles:
- Each reference data element in a Reference Data Set has a role.
- Each workflow input that expects reference data can be configured with a workflow role.
By matching up the roles in a Reference Data Set with the workflow roles configured in a workflow, relevant Reference Data Sets can be offered when the workflow is launched.
Many template workflows, delivered with QIAGEN CLC software, are configured with workflow roles. QIAGEN Reference Data Sets and any Custom Sets that contain the corresponding roles are thus available for selection when launching these template workflows.
Any single word can be used as a role/workflow role. QIAGEN Reference Data Sets usually use a term describing the data type, such as "sequence", "gene", "mrna", "cds", etc.
For information about configuring reference data inputs in workflows, see Reference data and workflows.
Managing Reference Data Sets
Reference Data Sets are downloaded, managed and created using the Reference Data Manager. Reference Data Sets containing some commonly used reference data are available for download under the QIAGEN Sets tab of the Reference Data Manager (see QIAGEN Sets). It is also easy to create new sets, known as Custom Sets, that refer to the data of your choice, using roles defined by you.
Figure 11.10: Reference Data Sets containing all the workflow roles specified in a workflow are available for selection in the launch wizard, including Custom Sets.
Creating Custom Sets
To create a Custom Set, you can:
- Base it on an existing Reference Data Set
- To do this, select a Reference Data Set and click on the Create Custom Set... button above the listing of data elements, on the right. This opens the "Create Custom Data Set" dialog, populated with the roles defined in the selected Reference Data Set, and any specified data elements (figure 11.11). This new set can then be customized.
- Build it from scratch
- To start from scratch, click on the Custom Sets tab at the top of the Reference Data Manager and then click on the Create button on the right. This opens the "Create Custom Data Set" dialog without any roles or elements predefined (figure 11.12).
- Base it on reference data used in a specific workflow
- To do this, open the "Create Custom Data Set" dialog using one of the methods described above, and then click on the Add to Match Workflow... button. You can specify an installed workflow from a drop-down list, or select a workflow from the Navigation Area using the "Workflow design" field (figure 11.13). If buttons are disabled, it usually means the selected workflow does not contain inputs defined with workflow roles.
Figure 11.11: After selecting a QIAGEN Set, click on the Create Custom Set button on the right hand side to open the Create Custom Data Set dialog populated with the roles and elements of that reference set.
Figure 11.12: Under the Custom Sets tab, click on the Create button, on the right, to open the Create Custom Data Set dialog without any roles or elements predefined.
Figure 11.13: Click on the Add to Match Workflow button in the Create Custom Data Set dialog to populate the dialog with the roles and elements defined in a workflow.
When basing a Custom Set on an existing Reference Data Set or on the references defined in a workflow, any predefined data elements will be listed in the Item(s) column of the relevant roles. Data elements can be selected or updated by double-clicking on the cells in that column.
You can define new roles in Custom Sets, or assign roles already in use in existing Reference Data Sets (figure 11.14). Note that workflow role names cannot contain spaces.
Figure 11.14: The Create Custom Sets dialog showing a newly created role, and the drop down menu of already existing roles.
Specifying multiple data elements with a single role
For certain types of reference data, it may be relevant to associate multiple data elements with a single role. For example, if a workflow will be used with the same reference data except for the target regions, it could be efficient to create a single Custom Set with the target_regions role linked to multiple elements. A Custom Set with multiple elements assigned to it is shown in figure 11.16. Clicking on the arrow at the right hand side of the target_regions row reveals a list of the elements assigned to this role.
Working with Custom Sets
Once saved, the new Custom Set will be listed under the Custom Sets tab of the Reference Data Manager. These sets will also be available to select in launch wizards when:
- Launching workflows where a workflow role is specified. A dedicated wizard step is presented for specifying the Reference Data Set to use (figure 11.16).
- Launching tools or launching workflows where a workflow role is not specified for the input. In launch wizard data selection steps, click on the Reference Data tab to see reference data organized in their Reference Data Sets (figure 11.15).
Figure 11.15: Data elements in Custom Sets can be found under the Reference Data tab in launch wizard steps where reference data elements need to be specified.
Figure 11.16: When launching a workflow where workflow roles have been defined for one or more inputs, a wizard step will prompt for the relevant Reference Data Set to be selected. Information about the contents of that set are shown in the right hand pane. The small arrow in the target_regions row indicates that more than one element has been assigned to that role.
When a Reference Data Set contains a role that has multiple data elements assigned to it, there will be an arrow beside that role name (figure 11.16). The list of those elements can be revealed by clicking on that arrow (figure 11.17) . A subsequent launch wizards step allows the selection of the particular element that should be used for the workflow run (figure 11.18).
Figure 11.17: Information about the elements assigned to workflow roles in a selected Reference Data Set are shown in the right hand pane in the "Specify reference data handling" wizard step. For roles with multiple elements assigned, clicking on the arrow at the right side of the row reveals the list of these elements.
Figure 11.18: In the previous wizard step, a Reference Data Set was selected where multiple elements were assigned to the role target_regions. In this wizard step, a drop-down list of those elements is presented, from which the relevant target region track can be selected.
Searching for data available in Custom Sets
Use the search field under the top toolbar in the Reference Data Manager to search for terms in Custom Sets. To search for just an exact term, put the term in quotes.
Hover the cursor over a hit to see what aspect of the result matched the search term (figure 11.19). Double-click on a search result to open it.
Figure 11.19: Terms entered in the search field when the Custom Sets tab is selected are searched for in the sets available under that tab. Hovering the cursor over a hit opens a tooltip with information about the match.
Subsections