We recommend using the
cgc-standard.json CloudFormation template, described below, to set up the standard resources needed for a CLC Genomics Cloud setup. Some configuration can be done when using this template. Additional AWS Batch queues can be added using another CloudFormation template afterwards. That template provides more configuration options (see Adding more AWS Batch queues for CLC jobs).
Below the description of the resources established using the
cgc-standard.json CloudFormation template are detailed instructions of how to use the template, as well as information about the AWS IAM users created using the template.
cgc-standard.json CloudFormation template defines the resources needed for a CLC Genomics Cloud, which include:
- Three AWS Batch queues The queues are named
cgc-large. When a user launches a job to run on the cloud from their CLC Workbench, they select a queue to send the job to from a drop down list of the available queues. Details of each of these queues are provided in Standard AWS Batch queues for CLC Genomics Cloud.
- An S3 bucket for system files The name of this bucket begins with "cgc-system-".
This bucket is used by the CLC software for system files, including read mapper indexes. It is not intended for storing sample data or results. It is not listed when browsing using CLC software. Details about the retention policies on this bucket are provided in System file retention policies.
Note: S3 buckets for holding input data and results need to be created directly in AWS. They are not created by CloudFormation templates provided by QIAGEN.
- Two AWS IAM users One with properties supporting submission of analyses, and the other allowing only access to AWS S3 buckets. This is described in more detail below.
When working with a CLC Workbench, access to the AWS Batch queues is determined by the access rights of the AWS IAM user configured in the AWS connection. With a CLC Server, access to AWS Batch queues can be fine tuned by setting group permissions on cloud presets, using the web administrative interface of the CLC Server, as described in Configuring cloud presets.
To set up the standard infrastructure on AWS for handling CLC jobs:
- Log into the AWS console as a user with privileges that allow the infrastructure described above to be created.
- Set the region to the one the AWS resources should be established in.
- Copy the URL below:
- Go to CloudFormation and click on Create stack.
- Review the max vCPUs setting for each queue.
By default, the values allow for up to 10 EC2 instances to be launched. I.e. up to 10 jobs can be run in parallel. This value can be increased or decreased. We recommend it is not decreased below the number of cores designated for each job. These values can be found in the setting details provided in Standard AWS Batch queue settings.
- Review the Disk size setting for each queue.
This value refers to the space available for each job when it is running.
- Specify the URL for custom plugin repository if you have custom CLC plugins providing tools that should run on the cloud.
See Making custom plugins available for cloud analyses for the requirements for a custom plugin location.
This field should be left blank if you require access only to plugins and modules distributed by QIAGEN. These are the plugins and modules listed in CLC Workbench and CLC Server Plugin Manager, or on the QIAGEN website:
When stack creation is complete, go to the Outputs tab of the main stack to find the credentials for the AWS IAM users created (figure 2.1).
AWS Connections using the "SubmitterUser" (
CgcSubmitterUser-<EnvironmentId>) credentials allow CLC analyses to be submitted an AWS Batch queue for analysis. This user also has full access to AWS S3, read access to CloudWatch logs, and can list CloudFormation resources.
AWS Connections using the "BrowserUser" (
CgcBrowserUser-<EnvironmentId>) credentials support listing S3 buckets and accessing bucket contents. Jobs cannot be submitted to run on AWS using these credentials.
AWS IAM user credentials are entered in AWS Connections in CLC Workbenches or the CLC Server, described in Configuring the AWS connection in the Workbench and Configuring the AWS Connection in the CLC Server, respectively.
The full policy for each user can be viewed in the Identity and Access Management (IAM) area of the AWS Console.
One or more AWS S3 buckets must be created for holding input data and results. These buckets must be created in the same AWS account and region that the AWS Batch queues were established in. Please refer to AWS documentation for details: