Submitting workflows to the cloud using the CLC Server Command Line Tools

Submitting workflows to the cloud

Workflows installed on a CLC Genomics Server can be submitted to run in the cloud using the CLC Server Command Line Tools. General information about launching workflows using the CLC Server Command Line Tools is at https://resources.qiagenbioinformatics.com/manuals/clcservercommandlinetools/current/index.php?manual=Launching_workflows.html. That information focuses on submitting jobs for execution on a CLC Genomics Server, but most of it also applies to workflows submitted for execution on the cloud. This section provides cloud-specific details.

Specifying a cloud preset

The cloud preset to use must be specified when submitting workflows to run on the cloud. The name of the cloud preset should be supplied as the value for the -L option.

To see the list of cloud presets available, run the clcserver command with no arguments or with an incomplete set of arguments, as described on https://resources.qiagenbioinformatics.com/manuals/clcservercommandlinetools/current/index.php?manual=Basic_usage.html.

Information about configuring cloud presets on a CLC Server is in Configuring cloud presets.

Specifying input data for analyses

CLC format data in CLC Server File System Locations or in remote locations accessible via http, https, or S3 URL can be provided as inputs to workflows^4.1.

Data in other formats can be supplied as input by using on-the-fly import. For example, using on-the-fly import, FASTQ sequence files would be imported as the first step in the workflow, avoiding the need for running a specific import command before running the workflow.

General information about input data for analyses run on the cloud is provided in General information about input data for cloud analyses.

Further details about providing input data to analyses using the CLC Server Command Line Tools is at https://resources.qiagenbioinformatics.com/manuals/clcservercommandlinetools/current/index.php?manual=Providing_input_data_analyses_on_CLC_Server.html.

Specifying where results should be saved

Results generated using workflows run on the cloud are saved to AWS S3. The location to save results to is specified using an S3 URL as the value for the relevant parameter.

General information about specifying where results should be saved when using the CLC Server Command Line Tools is provided at https://resources.qiagenbioinformatics.com/manuals/clcservercommandlinetools/current/index.php?manual=Saving_workflow_outputs_exporting_results.html.

Accessing AWS CloudWatch logs via the command line

The CLC Server Command Line Tools command -A cgc_read_aws_logs supports the retrieval of AWS CloudWatch logs for jobs run on a CLC Genomics Cloud.

The messages returned from jobs run on the cloud include the information needed to access the AWS CloudWatch log for that job. The AWS CloudWatch information retrieved is the same as that returned when the "Execution Log" is opened in the CLC Workbench, either via the Processes tab or via options under the Remote Files tab, as described in Accessing results from the Processes tab.

Footnotes

... workflows ^4.1: Support for http, https and S3 URLs for directly specifying files in remote locations, i.e. not needing to specify the location as a clccloudfile, was introduced in CLC Genomics Server 23.0.3 and Cloud Server Plugin 23.0.1, as was the ability to supply CLC format files in remote locations directly as input, without needing to use on-the-fly import.