Configuring input and output elements


Configuring Workflow Input elements

Workflow Input elements are the main element type for bringing data into a workflow. At least one such element must be present in a workflow. By default, when a workflow is launched, the workflow wizard will prompt for data to be selected from the Navigation Area, or for data files to be imported on-the-fly using any compatible importer.

Workflow Input elements support the input of CLC format data, raw NGS data formats, such as fastq and fasta format files, and some other formats. When launching the workflow, data outside CLC locations is selected by choosing the "Select files for import" option. Doing this is referred to as on-the-fly import.

Like other workflow elements, Input elements can be configured to restrict the options available for configuration when launching the workflow. See Basic configuration of workflow elements for more on locking and unlocking element options.

Configuring import options

Selection of input data from the Navigation Area (already imported data) or import of raw data using on-the-fly import can be enabled or disabled in Input elements. (figure 14.39).

When on-the-fly import is enabled, you can choose whether to limit the importers available when the workflow is launched, and you can configure settings for importers that are selected. On-the-fly import options are:

Image configure-workflow-input-element-genomics
Figure 14.39: Workflow Input elements can be configured to limit where data can be selected from and what importers can be used for on-the-fly import.

Where reference data is needed as input to a tool, it can be configured directly in the relevant input channel, or an Input element can be connected to that input channel. Reference data can be preconfigured in a workflow element, so that when launching the workflow, that data is used by default.

Further details about reference data and workflows

Input channels where reference data is expected can have a data element explicitly selected or a "workflow role" can be specified (figure 14.40).

Specifying a workflow role can be useful in workflows requiring various reference data elements (e.g. a reference sequence, annotation tracks, variant tracks, etc.) and where that workflow will be run using different sets of reference data. Workflow roles prevent the need to explicitly specify each reference data element when launching the workflow using different reference data from the previous run. Workflow roles are used in combination with Reference Data Sets, which are managed using the Reference Data Manager (Reference Data Manager).

In a Reference Data Set, a workflow role is defined for each element in that Set (QIAGEN Sets). A workflow role can be assigned to each element of your own data imported to the Reference Data Manager, (Custom Sets).

You can specify both a reference data element and a role for a given input:

Image inputroleconfig
Figure 14.40: A workflow role has been configured in this workflow Input element. When launching this workflow, a Reference Data Set would be prompted for by the wizard. The data element with the specified role in that Reference Data Set would then be used as input.

Image launchingwithwfroles
Figure 14.41: When one or more workflow elements has been configured with a workflow role, you are prompted to select a Reference Set. The elements from that set with the relevent roles are used in the analysis. Here, the option to use default reference data - i.e. the specified elements, is also available. This reflects the fact that this workflow has at least one workflow element configured with both a workflow role and a data element, and there are no locked inputs relying only on a workflow role.

On-the-fly import versus using workflow elements for specific importers

For importing data as the first step of a workflow, on-the-fly importer, as described above, is the most flexible and commonly used option. However, workflow input elements for specific NGS importers are also available.

Examples using each of these options are shown in figure 14.42. How these translate when launching the workflow is shown in figure 14.43. The relative merits of each option are outlined in table 14.1.

Image workflow_input_elements_and_import
Figure 14.42: Raw data can imported as part of a workflow run in 2 ways. Left: Include an Input element. and use on-the-fly import. Right: Use a specific Import element. Here, the Illumina import element was included.

Image workflow_import_on_launch
Figure 14.43: Top: Launching a workflow with an Input element and choosing to select files to import on-the-fly. Bottom: Launching a workfow with a dedicated import element, in this case, an Illumina import element.


Table 14.1: Workflow import methods compared
Functionality Input element Dedicated import element
Running in batch mode Supported.
Check the Batch option in the launch wizard.
Not supported.
(The Batch option is not visible in the launch wizard).
Iterate elements Supported. Supported.
Choosing an importer when launching Any available importer can be selected when launching. Use of already-imported data is also supported. Workflow authors can specify the importers available when launching. Only data formats relevant for the specific importer can be selected for use.
Configuring import options Options for all importers allowed by the workflow author can be configured, and set to be unlocked or locked. Import options for the specific importer can be configured,and set to be unlocked or locked.
Saving imported elements Not supported.
The elements created during import are not saved.
Supported.
If an Output element is attached to the Import element, the elements created during import can be saved.


Notes:


Configuring Workflow Output and Export elements

Results generated by a workflow are only saved if the relevant output channel of a workflow element is connected to a Workflow Output element or an Export element. Data sent to output channels without an Output or Export element attached are not saved.

Terminal workflow elements with output channels must have at least one Workflow Output element or Export element connected.

Configuring custom names for workflow results

The names to assign to outputs and exported files from workflows can be configured to include specific text as well as information taken from a workflow run, for example, the names of inputs to the analysis, dates and times the results were generated, etc.

To configure the naming pattern for a Output or Export workflow element, double-click on it, or right-click on it and then select the option Configure... from the menu. The naming pattern in Output elements is defined in the Custom output name field (figure 14.44). In Export elements, it is defined in the Custom file name field.

Hover the mouse cursor over the configuration field to reveal a tooltip containing a list of available placeholders (figure 14.45). Placeholders are terms within curly brackets used to indicate that particular information from a workflow run should be included in the output name or exported file name. Terms in placeholders are not case specific.

Note: Placeholders used by export tools run directly (not via a workflow) are described in Specifying export file names using export tools. Other settings relating to export, relevant both for exports run directly or in a workflow context, are described in Export tool parameters.

Image workflow_output_name
Figure 14.44: Defining the name to assign to an output from a workflow. The default naming pattern for Output elements uses the placeholder {1}, which is a synonym for the placeholder {name}.

Image workflowgenericoutput
Figure 14.45: Hover the mouse cursor over the field where a custom name can be configured to reveal a tooltip with a list of available placeholders.

Placeholders available for Output and Export workflow elements are:

In addition to the placeholders above, the placeholder {extension} is available for exported file names. This is replaced by the default file extension for the exported file's format, e.g. .pdf, .txt.

Saving results to subfolders

Workflow outputs and exported files can be saved into subfolders by adding a forward slash / at the start of the custom name definition.

For example, with an Output element configured with /variants/{name}, the resulting output would be saved to a subfolder called variants, placed within the folder selected for outputs when the workflow is launched. If a specified subfolder does not already exist, it is created when the outputs are saved.

When defining subfolders for outputs or exported files, terms between all forward slash characters are interpreted as subfolders. For example, a name like /variants/level2/level3/myoutput would put the data item called myoutput into a folder called level3 within a folder called level2, which itself is inside a folder called variants. The variants folder would be placed under the location selected for storing the workflow outputs.

Temporary, intermediate workflow results

During a workflow run, temporary, intermediate results may be generated, including for output channels that aren't connected to an Output or Export element.

Such intermediate results are normally deleted automatically after the workflow run completes. If a problem arises such that the workflow does not complete normally, intermediate results may not be deleted and will be in a folder named after the workflow with the word "intermediate" in its name.



Footnotes

...sec:importNGS14.1
Paired read handling for workflows launched in batch mode, or workflows with Iterate elements, is the same as for the importer tools themselves: If the Paired option is checked, files are handled as described in the manual section on NGS importers. In CLC Genomics Workbench21.x, this was also the case in most circumstances. However, if batch units were based on data organization and all files were in the same folder, each file was treated as a separate batch unit irrespective of whether the Paired option was checked.