Launching workflows with Iterate elements

The section focuses on launching workflows that contain Iterate elements using the CLC Server Command Line Tools. Iterate elements are a type of control flow element, controlling the flow of data through an analysis. Iterate elements are placed at the top of a branch of a workflow that should be run multiple times, using different inputs in each run. The sets of data to use in each run are referred to as "batch units".

Collect and Distribute elements are, optionally, placed downstream of an Iterate element, where they collect outputs from the upstream iteration block and distribute them as inputs to downstream analyses. Most Collect and Distribute elements have a single input channel and a single output channel and do not require any parameters to be specified on the command line. Writing commands for other situations is described at the end of this section.

General information about control flow elements is provided at https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Control_flow_elements.html.

The steps between an Iterate element and a Collect and Distribute element are referred to as an "iteration block". The workflow in figure 8.3 contains a single iteration block (shaded in turquoise), where steps within that block are run once per batch unit. The Collect and Distribute element collects all the results from the iteration block and sends it as input to the next stage of the analysis (shaded in purple).

Image workflow-blocks-illustration
Figure 8.3: The roles of the Iterate and Collect and Distribute control flow elements are highlighted in the context of RNA-Seq and differential expression analyses. RNA-Seq Analysis lies downstream of an Iterate element, within an iteration block (shaded in turquoise). It will thus be run once per batch unit. Differential Expression for RNA-Seq lies immediately downstream of a Collect and Distribute element, and is sent all the expression results from the iteration block as input for a single analysis.

The following are key to launching workflows containing an Iterate element:

For workflows with a single Iterate element that has a single input channel and a single output channel, and where the batch units are based on the organization of the input data, no parameters relating to the Iterate element need to be provided in the command. In other cases, the parameters below need to be specified. Parameter names start with the workflow element name, which in this case was the default name, Iterate.

In cases where the Iterate element has multiple input channels, the first input channel is considered the primary input channel by default. To specify a different primary input channel, use the --iterate-primary-input-channel parameter. An integer value is expected, where the first channel is specified with the value 0, the second channel is specified with the value 1, and so on.

Further information about defining batch units is provided at https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Running_workflows_in_batch_mode.html.

Launching workflows containing Collect and Distribute elements

The most common situation is to have a Collect and Distribute element with a single input channel and a single output channel, as is the case in the example in figure 8.3. With this design, the results from all the batch units in the upstream iteration block are collected and passed on together as input to the connected downstream step(s). Such Collect and Distribute elements do not require any parameters to be defined on the command line.

Where the Collect and Distribute element has more than one output channel, the parameters below must be specified. Parameter names start with the workflow element name, which in this case was the default name, Collect and Distribute.

Template workflow example using Iterate and Collect and Distribute elements

The RNA-Seq and Differential Gene Expression Analysis template workflow, distributed with the CLC Genomics Workbench, provides an example of using Iterate and Collect and Distribute elements. It is described in detail at https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=RNA_Seq_Differential_Gene_Expression_Analysis_workflow.html.