Configuring input and output elements


Configuring Workflow Input elements

Workflow Input elements are the main element type for bringing data into a workflow. At least one such element must be present in a workflow. By default, when a workflow is launched, the workflow wizard will prompt for data to be selected from the Navigation Area, or for data files to be imported on-the-fly using any compatible importer

They support the input of CLC format data, as well as supported, raw NGS data formats, such as fastq and fasta format files. When launching the workflow, data outside a CLC locations is selected by choosing the "Select files for import" option. Doing this is referred to as on-the-fly import.

Dedicated NGS import elements are also available.

Examples using each of these input element types are shown in figure 12.28. How these translate when launching the workflow is shown in figure 12.29. The relative merits of each option are outlined in table 12.1. For most uses, on-the-fly import will be the most versatile option.

Image workflow_input_elements_and_import
Figure 12.28: Raw data can imported as part of a workflow run in 2 ways. Left: Include an Input element. and use on-the-fly import. Right: Use a specific Import element. Here, the Illumina import element was included.

Image workflow_import_on_launch
Figure 12.29: Top: Launching a workflow with an Input element and choosing to select files to import on-the-fly. Bottom: Launching a workfow with a dedicated import element, in this case, an Illumina import element.


Table 12.1: Workflow import methods compared
Functionality Input element Dedicated import element
Running in batch mode Supported.
Check the Batch option in the launch wizard.
Not supported.
(The Batch option is not visible in the launch wizard).
Iterate elements Supported. Supported.
Choosing an importer when launching Any available importer can be selected when launching. Use of already-imported data is also supported. Workflow authors can specify the importers available when launching. Only data formats relevant for the specific importer can be selected for use.
Configuring import options Options for all importers allowed by the workflow author can be configured, and set to be unlocked or locked. Import options for the specific importer can be configured,and set to be unlocked or locked.
Saving imported elements Not supported.
The elements created during import are not saved.
Supported.
If an Output element is attached to the Import element, the elements created during import can be saved.


Notes:

If desired, Workflow Input elements can be configured to restrict the data selection options available when launching the workflow. To do this, double-click on the element, or right-click on the element name and select the Configure... option. This opens a dialog like that in figure 12.27.

Common configurations for workflow Input elements include:


Configuring Workflow Output and Export elements

Results generated by a workflow are only saved if the relevant output channel of a workflow element is connected to a Workflow Output element or an Export element. Data sent to output channels without an Output or Export element attached are not saved.

Terminal workflow elements with output channels must have at least one Workflow Output element or Export element connected.

The naming pattern for workflow outputs and exports can be specified by configuring Workflow Output elements and Export elements respectively. To do this, double click on a Workflow Output or Export element, or right-click and select the option Configure.... Naming patterns can be configured in the Custom output name field in the configuration dialog.

The rest of this section is about configuring the Custom output name field, with a focus on the use of placeholders. This information applies to both Workflow Output elements and Export elements. Other configuration settings for Export elements are the same as for export tools, described in Export tool parameters. Placeholders available for export tools, run directly (not via a workflow) are different and are described in export tools section of the manual.

Configuring custom output names

By default, a placeholders is used to specify the name of an output or exported file, as seen in figure 12.32. Placeholders specify a type of information to include in the output name, and are a convenient way to apply a consistent naming pattern. They are replaced by the relevant information when the output is created.

The placeholders available are listed below. Hover the mouse cursor over the Custom output name field in the configuration dialog to see a tooltip containing this list. Text-based forms of the placeholders are not case specific.

You can choose any combination of the placeholders and text, including punctuation, when configuring output or export names. For example, {input}({day}-{month}-{year}), or {2} variant track as shown in figure 12.33. In the latter case, if the first workflow input was named Sample 1, the name of the output generated would be "Sample 1 variant track".

Image workflow_output_name
Figure 12.32: The names that outputs are given can configured. The default naming uses the placeholder {1}, which is a synonym for the placeholder {name}.

Image workflowoutputvarianttrack
Figure 12.33: Providing a custom name for the result.

It is also possible to save workflow outputs and exports into subfolders by using a forward slash character / at the start of the output name definition. For example the custom output name /variants/{name} refers to a folder "variants" that would lie under the location selected for storing the workflow outputs. When defining subfolders for outputs or exports, all later forward slash characters in the configuration, except the last one, will be interpreted as further levels of subfolders. For example, a name like /variants/level2/level3/myoutput would put the data item called myoutput into a folder called level3 within a folder called level2, which itself is inside a folder called variants. The variants folder would be placed under the location selected for storing the workflow outputs. If the folders specified in the configuration do not already exist, they are created.

Note: In some circumstances, outputs from workflow element output channels without a Workflow Output element or an Export element connected may be generated during a workflow run. Such intermediate results are normally deleted automatically after the workflow run completes. If a problem arises such that the workflow does not complete normally, intermediate results may not be deleted and will be in a folder named after the workflow with the word "intermediate" in its name.



Footnotes

...sec:importNGS12.1
Paired read handling for workflows launched in batch mode, or workflows with Iterate elements, is the same as for the importer tools themselves: If the Paired option is checked, files are handled as described in the manual section on NGS importers. In CLC Genomics Workbench21.x, this was also the case in most circumstances. However, if batch units were based on data organization and all files were in the same folder, each file was treated as a separate batch unit irrespective of whether the Paired option was checked.