External command
The sections under the External command editor tab (figure 12.4) are:
- External application name
- The name seen in the CLC Workbench Toolbox menu and given to the corresponding workflow element for this external application. This name is also used as the basis of the name to use to launch the external application using the CLC Server Command Line Tools.
- Command line
- The command run when the external application is launched. The information to provide differs for standard external applications and containerized external applications, and is described in more detail below for each case.
Parameters values that should be substituted at run time are written within {curly brackets}. This includes parameters that should be configurable by the end user. Other parameters and values are written as normal in this field.
- General configuration
- Parameter values specified in {curly brackets} in the Command line field will have a corresponding entry in the General configuration area. There, these values are configured, including specifying their type, and for some types, configuring the values to use or to be offered to end users to select from. The description of a parameter can be configured by clicking the tooltip icon (
) next to the parameter. The description will be displayed in clients, for example as a tooltip when running the external application from the CLC Workbench, or in the help listed for this application via the CLC Server Command Line Tools.
- External application type
- External applications can be "Standard (non-containerized)", or "Containerized: Docker". Standard external applications are executed directly on a server system. Containerized external applications are run from within containers.
To run containerized external applications, the containerized execution environment must be enabled and configured, as described in Configuring the containerized execution environment.
Command lines for standard external applications
The Command line field should contain the path to the application and all parameters to be passed to that application. For illustration, a simple example of the cp (copy) command with 2 positional parameters is shown in figure 12.4.
Command lines for containerized external applications
The Command line field should contain only the parameters to send to docker that are not already configured for the containerized execution environment, described in Configuring the containerized execution environment. These will usually be aspects of the command specific to running the individual external application.
For example, for a container with a command that takes one argument, the information written in this field could take the form:
<image-identifier> <command-to-run-from-image> <parameter>
The full docker command executed when an external application is launched combines the information configured for the containerized execution environment with the information provided in the Command line field. So, for example, if the default configuration settings for the containerized execution environment were used, the full docker command run when this external application is launched would take the form:
docker run -v <import-export-dir>:<mount-point-in-image> \ <image-identifier> <command-to-run-from-image> <parameter>
In figure 12.5, the command for a containerized external application running the alignment program MAFFT is shown as an example. That example is described in more detail in MAFFT example section.
Figure 12.5: The command line for a containerized external application contains an reference to the image, here the repository and tag have been used, but it could also be the image identifer, followed by the command to run from the container and the parameters to provide to that command. Here there is a single parameter, written in curly brackets, indicating that the value will be substituted at run time.
Parameter value types
Details of parameter value types are outlined below. A brief description is also provided in the web administrative interface when a value type is selected and the mouse cursor is hovered over it. Particularly important types for external application configurations are User-selected input data (CLC data location), which is the usual choice for parameters specifying input data, and Output file from CL, which is the usual choice for specifying results generated by the underlying application.
- Text - The end user can provide text that will be substituted into the command at runtime. A default value can be configured.
- Integer - The end user can provide a whole number that will be substituted into the command at runtime. A default value can be configured. If no value is set, then 0 is the default used.
- Double - The end user can provide a number that will be substituted into the command at runtime. A default value can be configured. If no value is set, then 0 is the default used.
- Boolean text - A checkbox is shown in the Workbench wizard interface. If the user checks the box, the given text will be substituted into the command at runtime. If the box is unchecked, this means that no value will be substituted.
- CSV enum - A drop down list is presented to a Workbench end user, from which they can choose a desired option. The corresponding value will be substituted into the command at runtime. To configure this parameter type, enter a comma delimited list of the values to be substituted at runtime into the first box, and a comma delimited list of corresponding labels to display to end users in the second box. Each entry in a given list should be unique and the two lists should be of equal length.
For an example of this, please see Example: Velvet integration on setting up Velvet as an external application.
- User-selected input data (CLC data location) - The end user should specify one or more input files from those stored on the CLC Server. In the General configuration area, the appropriate exporter should be selected, so that the format of the data is will be as needed for the command line application. Each exporter can be configured further by clicking on the Edit parameters button, shown in figure 12.412.1. A window then appears with a list of configurable parameters, as shown in 12.6.
Choices to make when configuring export parameters include:
- Default values to be applied when the external application is run. To edit fields that are locked by default, click on the symbol of the lock image to open the lock. Once unlocked, changes can be made.
- Which parameters end users will be able to configure when launching the external application. A parameter with an unlocked symbol beside it will be displayed to the end user and its value will be editable. Locked parameters are not shown and cannot be changed by end users.
Figure 12.6: Clicking on the Edit parameters button for the "Sequences to copy" parameter brings up a window with the editable parameters for the selected exporter. Parameters with a locked symbol beside them are not shown to, and are thus not configurable by, the end user. - User-selected files (Import/Export directory) - The end user should specify one or more input files stored in an Import/Export area configured on the CLC Server. This option is used to specify files not in a CLC location. Files can be configured so they are pre-selected for the end user, but the end user can deselect pre-configured files when launching the external application.
- Output file from CL - This option should be used for parameters that define an output of the external command line application. Once selected, a drop down list appears with options for how the output should be handled:
- Where results should not be imported into the CLC Server, choose the option No standard import or map to high throughput sequencing importer.
- To import results into the CLC Server using a high throughtput sequencing (NGS) importer, choose the option No standard import or map to high throughput sequencing importer and then configure the importer to use under the High-throughput sequencing import / Post processing tab, described in High throughput sequencing importers and post processing tools.
- To import the results using a standard importer, choose the importer to use from the drop down list presented. If the import type Automatic is selected, the importer used is determined by the filename suffix in combination with a check of the format of the elements in the file. If the file type is not recognized, it will be imported as an external file. A list of file formats, including the expected filename suffix for each format, can be found in the appendix of the CLC Genomics Workbench manual:
Read more about search here:
http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Local_search.html.
The third, empty field can be used to enter the name of the file the external process is expected to produce. If left blank, the base name of the file produced by the command line tool will be used as the base name for the data element imported into the CLC Server. Specifying a default filename in the third field, including the relevant suffix (e.g. .fasta, .xlsx), is recommended.
When Output file from CL is selected for at least one value, the end user will need to provide a location on the CLC Server to store results. This will be the case even if the output of the external file will not be imported, as log files will still be written to the location selected.
- File - The end user should specify input files from their local machine. These are typically not CLC files. The CLC Server must be configured to allow direct data transfer from client systems for this option to be usable. If it is not, the parameter will not be configurable by the end user and they will see a message saying server upload is disabled when they try to launch the external application.
- Context substitute - The options are:
- CPU limit max cores The core limit defined for the server that executes the command will be substituted.
- Name of user The name of the user who launched the external application will be substituted.
- Boolean compound (legacy) - This is a legacy option, which is no longer recommended for use and will be removed in a future release.
A very simple configuration illustrating parameter configuration is shown in figure 12.4 for the cp command. In the General configuration area, the Sequences to copy parameter is set to User-selected input data (CLC data location) meaning that the end user will specify the data to be copied from a CLC File Location. That data will be exported to a fasta format file. The Copied sequences parameter is set to type Output file from CL, indicating that this is the output from the command, and the standard fasta importer was selected for importing the results into the CLC Server.
A tip for exploring how many files an exporter will generate
A simple way to explore how many files an exporter will generate with a given configuration is to set up an external application using the echo command and a single parameter linked to the exporter of interest. Configure the "Standard out handling" option, selecting the "Plain text" option, described in Stream handling. The output from such an external application is a file, which is re-imported into the CLC Server as a text file. This file contains the full paths to the files the exporter created.
If an exporter is configured in a way that will lead to multiple output files, then the full path to each output file will be substituted in the command at runtime. The external application itself must be able to handle the outputs generated.
Footnotes
- ...fig:extaptsimpleconfig112.1
- Configurable export parameters were introduced with CLC Genomics Server 10.0.