Example: MAFFT (containerized external application)

In this section, a containerized external application for MAFFT, an alignment program for amino acid or nucleotide sequences, is described. This example illustrates the use of a publicly available Docker image in combination with an "Included script" and extension of the external application command line beyond a single line.

More information about MAFFT is available from https://mafft.cbrc.jp/alignment/software/.

Using the resulting external application, a CLC software user specifies sequences to align and a location to save the alignment to. They can also optionally configure MAFFT settings. The alignment is run in a Docker container, and the results are then imported back into the CLC software.

Defining the MAFFT command line and configuring the parameters

The command line and general configuration of a MAFFT external application are shown in figure 16.44.

Image external_app_mafft_command_definition
Figure 16.44: Configuration a containerized external application for MAFFT. A publicly available Ubuntu image is pulled and all subsequent information in the command is run in the container created.

The external application type is set to "Containerized: Docker". Thus, the information in the "Command line" field will be appended to the command specified in the Containerized execution environment settings for the CLC Server.

The parameters in curly brackets are substituted at run time with the values specified.

Getting the MAFF software into the external application

In this example, the steps to obtain and unpack MAFFT are run in the Docker container. There are other ways this can be done. Choices for how to get the MAFFT software running in the container include:

Extending the command line with /bin/bash -c

In this example, the MAFFT software (mafft.bat) is called using a line in the external application command line field (after the semi-colon), rather than being included within the script. This approach can make writing the script simpler, and may make it easier for external application authors, to keep track of the roles of the various parameters being passed to the bioinformatics application.

For comparison, see the Kraken2 external application example for a case where all commands are contained in an included script.

Both approaches are equally valid.

Further details about parameters with values substituted at run time:

See External command information for more information about external application parameter types.

Image external_app_mafft_incscript
Figure 16.45: A script is defined that will be run in the Docker container. It includes the steps needed to make MAFFT available to run in the container.

Image external_app_mafft_wb-wizard
Figure 16.46: The wizard presented to Workbench users when they launch the MAFFT external application. They select the sequences they wish to align, and can, if they wish, edit the options being passed to MAFFT.

Settings under the Stream handling tab

Under the Stream handling tab, we define how information sent to standard out and standard error should be handled. The information in these streams can be useful for troubleshooting. The settings in this example are shown in figure 16.47.

The base names of these files are used to name the output channels of the corresponding workflow element (figure 16.48).

A general note about names

External application names and the names of parameters that refer to inputs a user will select are presented to users and administrators in various places. The names of files containing standard out or standard error information are also visible. For example, here, the name, "MAFFT", will be the name used in the External Applications section of the server web administrative interface, as well as in the CLC Workbench Tools menu in the Workbench and the corresponding workflow element (figure 16.48). "Sequences to align", will be used in the CLC Workbench wizard (figure 16.46) and in the corresponding workflow element.

Image external_app_mafft_stream_handling
Figure 16.47: Defining stream handling for the MAFFT containerized external application.

Image external_app_mafft_draw_workflow
Figure 16.48: A workflow containing the external application. In workflows, the outputs to collect can be specified. Here, all the outputs are configured to be saved.

Making the external application available for use

When this external application is saved, it becomes available to run on the CLC Server unless its status is set to Disabled.

To run external applications on the cloud, the CLC Cloud Module is needed. See https://resources.qiagenbioinformatics.com/manuals/clccloudmodule/current/index.php?manual=Using_external_applications_on_cloud_via_CLC_Workbench.html. The external applications need to be installed on the CLC Workbench that will be used for submitting jobs or for creating workflows that contain the external application. To do this, export the configuration(s) to an AWS S3 bucket accessible from the CLC Workbench. From the CLC Workbench, right-click on the configuration file in AWS S3 under the Remote Files tab, and choose the option Install External Applications. The external application(s) in the configuration file will be installed and made available from the External Applications Cloud folder under the Tools menu.

To export to a cloud location, a valid AWS Connection must be configured on the CLC Server, as described in AWS Connections in the CLC Server. To install external applications from an AWS S3 location, a valid AWS Connection must be configured in the CLC Workbench, as described at https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=AWS_Connections.html.

Reminder: If you plan to run external applications on the cloud via the CLC Server Command Line Tools, they must be included in a workflow, and that workflow must be installed on the CLC Server.

Of note when creating workflows: options are usually locked by default. To unlock parameters in a workflow element, double click on the central part of the element, or right-click on it, and choose the option Configure.... Then check for the lock/unlock icon beside each setting. See figure 16.49.

Image external_app_mafft_configure_workflow
Figure 16.49: By default, parameters are locked in workflow elements, as shown here. To allow users to configure the "MAFFT settings" option when launching this workflow, the parameter must be unlocked.

Results from the MAFFT external application

The alignment and the text files containing the standard out and standard error information are available in CLC format when the application has finished. A log of the job is also available. If the external application was run on the cloud, a file called workflow-result.json will also be present among the results (figure 16.50).

Image external_app_mafft_outputs
Figure 16.50: Outputs from a MAFFT containerized external application after running it on AWS using functionality provided by the CLC Cloud module.

These results will be in the location specified by the user when launching the application. If the job was run on a CLC Server, that will be in a CLC Server location. If the job was run on the cloud, the results will be in an AWS S3 location.

Interacting with files on AWS S3 via a CLC Workbench is described at https://resources.qiagenbioinformatics.com/manuals/clccloudmodule/current/index.php?manual=Working_with_AWS_S3_using_Remote_Files_tab.html. Interacting with files on AWS S3 via the CLC Server web interface is described in Browse AWS S3 locations.