Control flow elements
Control flow elements control the flow of data through a workflow. They can be found in the Control Flow folder of the Add Element wizard, as shown in figure 11.29.
Figure 11.29: Control flow elements are found under the Control Flow folder in the Add Element wizard.
The control flow elements described in this section are:
- Iterate Used to define a branch of a workflow that should be run multiple times, by splitting its inputs into groups (iteration units, sometimes referred to as batch units).
- Collect and Distribute Used downstream of an Iterate element to collect all results of an iteration, and group them for collective analysis by further tools.
Further details about each element are provided below.
Note: Like other elements, control flow elements in a workflow can be renamed (see Basic configuration of workflow elements).
Iterate
Elements downstream of an Iteration element are run once for each input, or group of inputs, provided. The inputs to be included in a given iteration are referred to as "batch units" or "iteration units". Multiple Iterate elements can be included in a single workflow.
For workflows with containing a single Input element (green box) and a single Iterate element, batch units can be defined based on the location of the input data or based on a information in a metadata table. For any workflow containing multiple Iterate elements, or where there is a single Iterate element and the Batch button is checked when starting the workflow, batch units must be defined using information in a metadata table (Running workflows in batch mode).
Running a workflow with a single Iterate element at the top of a workflow, no downstream Collect and Distribute element (described in the next section), and a single Input element is equivalent to running a similar workflow design without the Iterate element in Batch mode. In this case, the simpler workflow design, without the Iterate element, is usually preferable. In both cases, batch units are set up as described in Batch processing.
Configuring an Iterate element
The configuration options available for an Iterate element are shown in figure 11.30. They are:
- Number of coupled inputs The number of separate inputs for each given iteration. These inputs are "coupled" in the sense that, for a given iteration, particular inputs are used together. For example, when sets of sample reads should be mapped in the same way, but each set should be mapped to a particular reference (figure 11.31).
- Error handling Specify what should happen if an error is encountered. The default is that the workflow should stop on any error. The alternative is to continue running the workflow if possible, potentially allowing later batches to be analyzed even if an earlier one fails.
- Metadata table columns If the workflow is always run with metadata tables that have the same column structure, then it can be useful to set the value of the column titles here, so the workflow wizard will preselect them. The column titles must be specified in the same order as shown in the worfklow wizard when running the workflow. Locking this parameter to a fixed value (i.e. not blank) will require the definition of batch units to be based on metadata. Locking this parameter to a blank value requires the definition of batch units to be based on the organization of input data (and not metadata).
- Primary input If the number of coupled inputs is two or more, then the primary input (used to define the batch units) can be configured using this parameter.
Figure 11.30: The number of coupled inputs in this simple example is 2, allowing each set of sample reads to be mapped to a paticular reference, rather than using the same reference for all iterations.
Figure 11.31: Reads can be mapped to specified contigs due to the 2 input channels of the Iterate element. Using this design, a single sequence list containing all the unmapped reads from all the initial inputs is generated. That would not be possible without the inclusion of the Iterate and Collect and Distribute elements.
Collect and Distribute
Collect and Distribute elements are relevant in workflows with upstream Iterate elements. The steps between an Iterate element and a Collect and Distribute element are referred to as an "iteration block. When a Collect and Distribute element is encountered, intermediate results with that iteration block are collected. That data is passed to downstream elements according to configuration carried out when launching the workflow.
Configuring a Collect and Distribute element
By default, a Collect and Distribute element has one output channel. In this case, all results from the iteration block are collected and then passed on to downstream steps of the workflow.
More than one output channel can be configured by entering terms in a a comma separated list in the Outputs field (figure 11.32). The number of terms determines the number of output channels. Connections between these output channels and input channels of downstream elements determine how data should be distributed in the following stage of the workflow.
If the Collect and Distribute element has more than one output channel, the path taken by a given element is determined by the value in the metadata column specified when launching the workflow. This column can be preconfigured in the Group by metadata column setting.
Figure 11.32: A comma separated list of terms in the Outputs field of the Collect and Distribute element defines the number of output channels and their names.
For example, when launching the workflow in figure 11.33, a metadata column called "Type" was specified for defining which samples were cases and which were controls. The iteration units were defined by the contents of the "ID" column (figure 11.34).
Figure 11.33: In this workflow, each case sample is analyzed against all of the control samples.
Figure 11.34: Contents of the metadata column "Type" define which samples are cases and which are controls. Iteration units are defined by the contents of thethe "ID" column.