Batch processing

Batch processing refers to running an analysis multiple times, once per batch unit. For example, if you have 10 sequence lists and wish to run 10 mapping analyses, one per sequence list, you could launch all 10 analyses by setting up one batch job. Here, each sequence list would be a "batch unit".

This section focuses on batch processing when using individual tools. Further details about batch processing of workflows is provided in Launching workflows individually and in batches and Running part of a workflow multiple times .

Batch mode

Batch mode is activated by clicking the Batch checkbox in the dialog where the input data is selected (figure 12.9).

Image batchfolder
Figure 12.9: When launching an analysis in Batch mode, individual elements and/or folders can be selected. Here, a single folder that contains both elements and subfolders of elements has been selected.

In Batch mode, the analysis is run once per batch unit. A batch unit consists of the data elements to be analyzed together. A batch unit can be a single data element, or can consist of multiple data elements.

Batch units are made up of:

Batch overview

In the batch overview step, the elements in each batch unit can be reviewed, and refined based on their names using the fields Only use elements containing and Exclude elements containing.

In figure 12.12, the batch units, i.e. those elements and folders directly under the folder selected in figure 12.9, are shown. In each batch unit, data elements that could be used in the analysis are listed on the right hand side. Some batch units contain more than one data element. Those data elements would be analyzed together. To limit the analysis to just sequence lists containing trimmed sequences, the term "trim" has been entered into a filter field near the bottom.

Image batchfolderdetail
Figure 12.12: Overview of the batch units (left) and the input elements defined by each batch unit (right). By default, all elements that can be used as inputs are listed on the right (top). By entering terms in the filter fields, the list of elements in the batch units can be refined. Here, only sequence lists including trimmed sequences will be included (bottom) .

Folders that do not contain any elements compatible with the analysis are not shown in the batch overview.

Organization of the results

The options for where to save analysis outputs are shown in figure 12.13.

Image batchresulthandling
Figure 12.13: Options for saving results when an analysis is runin Batch mode.

The available options are:

The log file

In the final wizard step there is an option to Create a log. When checked, a log containing information about all the batch units will be created. This log includes the term "combined log" in its name. A log is also created for each individual batch unit.

Batch unit processes

When the job is running, there is one "master" process representing the overall batch job, and a separate process for each batch unit.

On a CLC Workbench, the batch units are executed sequentially - one batch unit at a time. This avoids overloading the computer.

On a CLC Server, all the processes are placed in the queue, and the queue takes care of distributing the jobs. If there are multiple job nodes or grid nodes, batch units may be processed in parallel.

Stopping a batch run

To stop the whole batch run, stop the "master" process.

On a CLC Workbench, find the master process in the Processes tab in the bottom left side. Click on the little triangle on the right hand side of the master process and choose the option Stop.