AWS Connections
AWS connections are used when:
- Accessing AWS S3 buckets, to import data from or export data to.
- Submitting analyses to a CLC Genomics Cloud setup, if available on that AWS account.
Configuring access to your AWS accounts requires AWS IAM credentials. Configuring access to public S3 buckets requires only the name of the bucket.
Working with stored data in AWS S3 buckets via the Workbench is of particular relevance when submitting jobs to run on a CLC Genomics Cloud setup making use of functionality provided by the CLC Cloud Module.
When launching workflows to run locally using on-the-fly import and selecting files from AWS S3, the files selected are first downloaded to a temporary folder and are subsequently imported.
All traffic to and from AWS is encrypted using a minimum of TLS version 1.2.
Configuring access to AWS resources
To configure an AWS Connection or to configure access to public AWS S3 buckets, go to:
Connections | AWS Connections ()
Already configured AWS connections and their status, and public S3 buckets are listed (figure 6.7). Editing or removal of these configurations is done from here.
Figure 6.7: The configuration dialog for AWS connections. Here, two valid AWS connections, their status, and a public S3 bucket are listed.
Configuring an AWS Connection
To configure a new AWS Connection, click on the Add AWS Connection button and enter the following information in the dialog (figure 6.8):
- Connection name: A short name of your choice, identifying the AWS account. This name will be shown as the name of the data location when importing data to or exporting data from Amazon S3.
- Description: A description of the AWS account (optional).
- AWS access key ID: The access key ID for programmatic access for your AWS IAM user.
- AWS secret access key: The secret access key for programmatic access for your AWS IAM user.
- AWS region: An AWS region. Select from the drop-down list.
- AWS partition: The AWS partition for your account.
The dialog continuously validates the settings entered. When they are valid, the Status box will contain the text "Valid" and a green icon will be shown. Click on OK to save the settings.
Figure 6.8: Configuration of an AWS Connection in a CLC Workbench
AWS connection status is indicated using colors. Green indicates the connection is valid and ready for use. Connections to a CLC Genomics Cloud are indicated in the CGC column (figure 6.7). To submit analyses to the CLC Genomics Cloud, the CLC Cloud Module must be installed and a license for that module must be available.
AWS credentials entered are stored, obfuscated, in Workbench user configuration files.
Note: Multiple AWS Connections using credentials for the same AWS account cannot be configured.
Adding a public S3 bucket
To add a public bucket, click on the Add Public S3 button and provide the public bucket name (figure 6.9).
Figure 6.9: Provide a public AWS S3 bucket name to enable access to data in that public bucket.
Importing data from AWS S3
AWS S3 buckets for each AWS Connection and public S3 bucket configured are available in the workflow launch wizard when using on-the-fly import in workflows, and in relevant import tool wizards (figure 6.10).
Figure 6.10: Files in local or remote locations can be selected for import by the Illumina importer of the CLC Genomics Workbench.
AWS S3 buckets can be browsed using functionality added when the CLC Cloud Module is installed. See https://resources.qiagenbioinformatics.com/manuals/clccloudmodule/current/index.php?manual=Working_with_AWS_S3_using_Remote_Files_tab.html for details.
Exporting data to AWS S3
To export data to an AWS S3 bucket, launch the exporter, and when prompted for an export location, select the relevant option from the drop-down menu (figure 6.11).
Figure 6.11: After an AWS connection is selected when exporting, you can select the S3 bucket and location within that bucket to export to.