QIAGEN Bioinformatics Manuals

External command

The sections under the External command editor tab (figure 16.8) are:

External application name

The name seen in the CLC Workbench Tools menu and given to the corresponding workflow element for this external application. This name is also used as the basis of the name to use to launch the external application using the CLC Server Command Line Tools.

Command line

The command run when the external application is launched. The information to provide differs for standard external applications and containerized external applications, and is described in more detail below for each case.

Parameters values that should be substituted at run time are written within {curly brackets}. This includes parameters that should be configurable by the end user. Other parameters and values are written as normal in this field.

General configuration

Parameter values specified in {curly brackets} in the Command line field will have a corresponding entry in the General configuration area. There, these values are configured, including specifying their type, and for some types, configuring the values to use or to be offered to end users to select from. The description of a parameter can be configured by clicking the tooltip icon (

) next to the parameter. The description will be displayed in clients, for example as a tooltip when running the external application from the CLC Workbench, or in the help listed for this application via the CLC Server Command Line Tools. The parameter value types are described in detail further below.

External application type

External applications can be "Standard (non-containerized)", or "Containerized: Docker". Standard external applications are executed directly on a server system. Containerized external applications are run from within containers.

To run containerized external applications, the containerized execution environment must be enabled and configured, as described in Configuring the containerized execution environment.

Image extapp-editor-mafft-1
Figure 16.8: The External command tab in the external application editor

Command lines for standard external applications

The Command line field should contain the path to the application and all parameters to be passed to that application. For illustration, an external application using the cp (copy) command with two positional parameters is shown in figure 16.9.

Image extappcopyseqsconfig
Figure 16.9: This simple external application includes the two arguments the cp command expects: the source and the destination.

Command lines for containerized external applications

The Command line field should contain only the parameters to send to docker that are not already configured for the containerized execution environment, described in Configuring the containerized execution environment. These will usually be aspects of the command specific to running the individual external application.

For example, for a container with a command that takes one argument, the information written in this field could take the form:

<image-identifier> <command-to-run-from-image> <parameter>

The full docker command executed when an external application is launched combines the information configured for the containerized execution environment with the information provided in the Command line field. So, for example, if the default configuration settings for the containerized execution environment were used, the full docker command run when this external application is launched would take the form:

 docker run -v <import-export-dir>:<mount-point-in-image> \
    <image-identifier> <command-to-run-from-image> <parameter>

In figure 16.10, the command for a containerized external application running the alignment program MAFFT is shown as an example.

Image extapp-editor-mafft-1
Figure 16.10: The command line for a containerized external application contains an reference to the image followed by the command to run from the container and the parameters to provide to that command. Parameters in curly brackets are substituted at run time.

Parameter value types

Details of parameter value types are provided below. Particularly important types for external application configurations are Data from CLC location, which is the usual choice for parameters specifying CLC data as input to be analyzed by the command line application, and Output from CL, which is the usual choice for specifying results generated by the underlying application.

To see a brief description while working in the web administrative interface, select a value type, and then hover the mouse cursor over it.

Inputs
- Data from CLC Location - The end user will be prompted for one or more CLC data elements on the CLC Server. In the General configuration area, the appropriate exporter should be selected, so that the format of the data being provided to the command line application will be as needed. Each exporter can be configured further by clicking on the Edit parameters button. A window then appears with a list of configurable parameters, (figure 16.11).
- External file - The end user will be prompted for one or more input files stored in an Import/Export area or an AWS S3 location that is accessible via configurations in the CLC Server. Files can be configured so they are pre-selected for the end user, but the end user can deselect pre-configured files when launching the external application.
- Local file - The end user can select a file from their local system. The full path to the file transferred into the temporary job location will be provided as the parameter value at run time. The CLC Server must be configured to allow direct data transfer from client systems for this option to be usable. If it is not, the parameter will not be configurable by the end user and they will see a message saying server upload is disabled when they try to launch the external application.
Outputs
- Output from CL The type to select when specifying results generated by the underlying application. Further details about this option are provided below.
General parameters
- Text - The end user can provide text that will be substituted into the command at runtime. A default value can be configured.
- Integer - The end user can provide a whole number that will be substituted into the command at runtime. A default value can be configured. If no value is set, then 0 is the default used.
- Double - The end user can provide a number that will be substituted into the command at runtime. A default value can be configured. If no value is set, then 0 is the default used.
- CSV enum - A drop down list is presented to a Workbench end user, from which they can choose a desired option. The corresponding value will be substituted into the command at runtime. To configure this parameter type, enter a comma delimited list of the values to be substituted at runtime into the first box, and a comma delimited list of corresponding labels to display to end users in the second box. Each entry in a given list should be unique and the two lists should be of equal length.
  For an example, please see Example: Velvet integration on setting up Velvet as an external application.
- Boolean text - A checkbox is shown in the Workbench wizard interface. If the user checks the box, the given text will be substituted into the command at runtime. If the box is unchecked, this means that no value will be substituted.
Settings not visible to users
- Context substitute - This will be substituted at runtime by the value selected: The options available are:
  - CPU limit max cores The core limit defined for the server that executes the command will be substituted.
  - Name of user The name of the user who launched the external application will be substituted.
- Included script - A script provided as a value for this parameter type becomes accessible to the external process at runtime. This enables integration scripts or extensive parameter files to be included in the External Application and injected into the execution context, rather than being an external dependency. For containerized External Applications this may be the injected integration that enables the direct use of a public available container, avoiding the need to create an proprietary derived version of the container containing your own integration script. Refer to the MAFFT containerized external application for an example.

A very simple configuration illustrating parameter configuration is shown in figure 16.9 for the cp command. In the General configuration area, the Sequences to copy parameter is set to Data from CLC Location meaning that the end user will specify the data to be copied from a CLC File Location. That data will be exported to a fasta format file. The Copied sequences parameter is set to type Output from CL, indicating that this is the output from the command, and the standard fasta importer was selected for importing the results into the CLC Server.

Output from CL

The Output from CL option is used for parameters that define an output of the command line application. This output can be a file or a folder containing files. When Output from CL is selected, a drop down list appears with options for how the output should be handled:

Where results should not be imported into the CLC Server, choose the option No standard import or map to high throughput sequencing importer.
To import results into the CLC Server using a high throughtput sequencing (NGS) importer, choose the option No standard import or map to high throughput sequencing importer and then configure the importer to use under the High-throughput sequencing import / Post processing tab, described in High throughput sequencing importers and post processing tools.
To import the results using a standard importer, choose the importer to use from the drop down list presented. If the import type Automatic is selected, the importer used is determined by the filename suffix in combination with a check of the format of the elements in the file. If the file type is not recognized, it will be imported as an external file. A list of file formats, including the expected filename suffix for each format, can be found in the appendix of the CLC Genomics Workbench manual: https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=List_bioinformatic_data_formats.html.
The third, empty field can be used to enter the name of the file the external process is expected to produce. If left blank, the base name of the file produced by the command line tool will be used as the base name for the data element imported into the CLC Server.
When Output from CL is selected for at least one value, the end user will need to provide a location on the CLC Server, or a cloud destination if in a cloud context, to store results. This will be the case even if the output of the external file will not be imported, as log files will still be written to the location selected.

Setting default values and locking parameters

When configuring exports and imports, default settings can be configured, and you can control which values are editable when launching the external applications. Click on the Edit parameters button when available, and then click on the symbol of a lock image to open or lock that parameter (figure 16.11). Once unlocked, changes can be made. A parameter with an unlocked symbol beside it will be displayed to the end user and its value will be editable. Locked parameters are not shown when launching the external application and cannot be changed by end users.

Image extappexportparamconfig
Figure 16.11: Clicking on the "Edit parameters" button for the "Sequences to copy" parameter brings up a window with the editable parameters for the selected exporter. Parameters with a locked symbol beside them are not shown to, and are thus not configurable by, the end user.

A tip for exploring how many files an exporter will generate

A simple way to explore how many files an exporter will generate with a given configuration is to set up an external application using the echo command and a single parameter linked to the exporter of interest. Configure the "Standard out handling" option, selecting the "Plain text" option, described in Stream handling. The output from such an external application is a file, which is re-imported into the CLC Server as a text file. This file contains the full paths to the files the exporter created.

If an exporter is configured in a way that will lead to multiple output files, then the full path to each output file will be substituted in the command at runtime. The external application itself must be able to handle the outputs generated.