Configuring input and output elements
Configuring Workflow Input elements
Workflow Input elements are the main element type for bringing data into a workflow. At least one such element must be present in a workflow. By default, when a workflow is launched, the workflow wizard will prompt for data to be selected from the Navigation Area, or for data files to be imported on-the-fly using any compatible importer
They support the input of CLC format data, as well as supported, raw NGS data formats, such as fastq and fasta format files. When launching the workflow, data outside a CLC locations is selected by choosing the "Select files for import" option. Doing this is referred to as on-the-fly import.
Dedicated NGS import elements are also available.
Examples using each of these input element types are shown in figure 12.28. How these translate when launching the workflow is shown in figure 12.29. The relative merits of each option are outlined in table 12.1. For most uses, on-the-fly import will be the most versatile option.
Figure 12.28: Raw data can imported as part of a workflow run in 2 ways. Left: Include an Input element. and use on-the-fly import. Right: Use a specific Import element. Here, the Illumina import element was included.
Figure 12.29: Top: Launching a workflow with an Input element and choosing to select files to import on-the-fly. Bottom: Launching a workfow with a dedicated import element, in this case, an Illumina import element.
|
Notes:
- Modified copies of imported data elements can be saved, no matter which of the import routes is chosen. For example, an Output element attached to a downstream Trim Reads element would result in Sequence Lists containing trimmed reads being saved.
- The use of Iterate elements to run all or part of a workflow in batches is described in Running part of a workflow multiple times.
- Configuration options for NGS importers are described in Import high-throughput sequencing data12.1.
If desired, Workflow Input elements can be configured to restrict the data selection options available when launching the workflow. To do this, double-click on the element, or right-click on the element name and select the Configure... option. This opens a dialog like that in figure 12.27.
Common configurations for workflow Input elements include:
- Configuring import options for primary inputs Enable or disable selection of input data from the Navigation Area (already imported data) or import of raw data using on-the-fly import.
When on-the-fly import is enabled, you can choose whether to restrict the available importers and import settings:
- Allow any compatible importer When selected, all compatible importers are displayed as options when launching the workflow. All parameters of these importers will be unlocked, and thus will be available to configure when launching the workflow.
- Allow selected importers When selected, one or more importers can be specified. Click on the Configure Parameters button to open the "Configure parameters" dialog, where the available import options can be configured for each selected importer. Select the importer to configure from the drop-down list at the top of this dialog. Each option can be locked if desired.
- Configuring a parameter input with a reference data element When a Workflow Input element is connected to an input channel expecting reference data, a particular data element can be specified. If that parameter is locked, there will be no opportunity to choose a different element when launching the workflow. If the parameter is left unlocked, the specified element will be used by default, but a different element can be selected when launching.
Such inputs can also be defined directly in the input channel, rather than connecting an Input element to that channel.
Configuring a parameter input with a workflow role Workflow Input elements, as well as input channels of other elements where reference data is expected, can have a data element explicitly selected, as described above, or a "workflow role" can be specified (figure 12.30).
Specifying a workflow role instead of a specific data element is useful when you wish to run the same workflow with different sets of reference data. If all the reference data elements were explicitly configured in the workflow, you would have to select each one individually. By contrast, using workflow roles, it is a single click to specify a different reference set to use.
Workflow roles are only useful in combination with Reference Data Sets, which are managed using the Reference Data Manager (Reference Data Manager).
A workflow role is defined for each element in a QIAGEN Set (QIAGEN Sets). You can assign a workflow role to each element of your own data imported to the Reference Data Manager, (Custom Sets).
Configuring a parameter input with both a data element and a workflow role
You can specify both a reference data element and a role for a given input:
- Doing this for a single element means that the Reference Data Set that the data element is a member of will be selected as the default Reference Data Set when launching the workflow.
- Doing this for all reference data inputs allows you to choose between using the specified "default" data elements or using a Reference Set, with the workflow roles defining the data to use (figure 12.31).
- Doing this for some, but not all inputs, where inputs are locked, means that the selected data elements only serve to indicate a default Reference Set. You will not have the option to launch the workflow using the default data elements.
Figure 12.30: A workflow role has been configured in this Workflow Input element. When launching this workflow, a Reference Data Set would be prompted for by the wizard. The data element with the specified role in that Reference Data Set would then be used as input.
Figure 12.31: When one or more workflow elements has been configured with a workflow role, you are prompted to select a Reference Set. The elements from that set with the relevent roles are used in the analysis. Here, the option to use default reference data - i.e. the specified elements, is also available. This reflects the fact that this workflow has at least one workflow element configured with both a workflow role and a data element, and there are no locked inputs relying only on a workflow role.
Configuring Workflow Output and Export elements
Results generated by a workflow are only saved if the relevant output channel of a workflow element is connected to a Workflow Output element or an Export element. Data sent to output channels without an Output or Export element attached are not saved.
Terminal workflow elements with output channels must have at least one Workflow Output element or Export element connected. |
The naming pattern for workflow outputs and exports can be specified by configuring Workflow Output elements and Export elements respectively. To do this, double click on a Workflow Output or Export element, or right-click and select the option Configure.... Naming patterns can be configured in the Custom output name field in the configuration dialog.
The rest of this section is about configuring the Custom output name field, with a focus on the use of placeholders. This information applies to both Workflow Output elements and Export elements. Other configuration settings for Export elements are the same as for export tools, described in Export tool parameters. Placeholders available for export tools, run directly (not via a workflow) are different and are described in export tools section of the manual.
Configuring custom output names
By default, a placeholders is used to specify the name of an output or exported file, as seen in figure 12.32. Placeholders specify a type of information to include in the output name, and are a convenient way to apply a consistent naming pattern. They are replaced by the relevant information when the output is created.
The placeholders available are listed below. Hover the mouse cursor over the Custom output name field in the configuration dialog to see a tooltip containing this list. Text-based forms of the placeholders are not case specific.
- {name} or {1} - default name for the tool's output
- {input} or {2} - the name of the first workflow input (and not the input to a particular tool within a workflow).
For workflows containing control flow elements, the more specific form of placeholder, described in the point below, is highly recommended.
- {input:N} or {2:N} - the name of the Nth input to the workflow. E.g. {2:1} specifies the first input to the workflow, while {2:2} specifies the second input.
Multiple input names can be specified. For example {2:1}-{2:2} would provide a concatenation of the names of the first first inputs.
See Ordering inputs for information about workflow input order, and Batching part of a workflow for information about control flow elements.
- {metadata} or {3} - the batch unit identifier for workflows executed in batch mode. Depending on how the workflow was configured at launch, this value may be be obtained from metadata. For workflows not executed in batch mode or without Iterate elements, the value will be identical to that substituted using {input} or {2}.
For workflows containing control flow elements, the more specific form of placeholder, described in the point below, is highly recommended.
- {metadata:columnname} or {3:columnname} - the value for the batch unit in the column named "columnname" of the metadata selected when launching the workflow. Pertinent for workflows executed in batch mode or workflows that contain Iterate elements. If a column of this name is not found, or a metadata table was not provided when launching the workflow, then the value will be identical to that substituted using {input} or {2}.
- {user} - name of the user who launched the job
- {host} - name of the machine the job is run on
- {year}, {month}, {day}, {hour}, {minute}, and {second} - timestamp information based on the time an output is created. Using these placeholders, items generated by a workflow at different times can have different filenames.
You can choose any combination of the placeholders and text, including punctuation, when configuring output or export names. For example, {input}({day}-{month}-{year})
, or {2} variant track
as shown in figure 12.33. In the latter case, if the first workflow input was named Sample 1
, the name of the output generated would be "Sample 1 variant track".
Figure 12.32: The names that outputs are given can configured. The default naming uses the placeholder {1}, which is a synonym for the placeholder {name}.
Figure 12.33: Providing a custom name for the result.
It is also possible to save workflow outputs and exports into subfolders by using a forward slash character /
at the start of the output name definition. For example the custom output name /variants/{name}
refers to a folder "variants" that would lie under the location selected for storing the workflow outputs. When defining subfolders for outputs or exports, all later forward slash characters in the configuration, except the last one, will be interpreted as further levels of subfolders. For example, a name like /variants/level2/level3/myoutput
would put the data item called myoutput
into a folder called level3
within a folder called level2
, which itself is inside a folder called variants
. The variants
folder would be placed under the location selected for storing the workflow outputs. If the folders specified in the configuration do not already exist, they are created.
Note: In some circumstances, outputs from workflow element output channels without a Workflow Output element or an Export element connected may be generated during a workflow run. Such intermediate results are normally deleted automatically after the workflow run completes. If a problem arises such that the workflow does not complete normally, intermediate results may not be deleted and will be in a folder named after the workflow with the word "intermediate" in its name.
Footnotes
- ...sec:importNGS12.1
- Paired read handling for workflows launched in batch mode, or workflows with Iterate elements, is the same as for the importer tools themselves: If the Paired option is checked, files are handled as described in the manual section on NGS importers. In CLC Genomics Workbench21.x, this was also the case in most circumstances. However, if batch units were based on data organization and all files were in the same folder, each file was treated as a separate batch unit irrespective of whether the Paired option was checked.