AWS Connections

AWS connections are used when:

Configuring access to your AWS accounts requires AWS IAM credentials. Configuring access to public S3 buckets requires only the name of the bucket.

Working with stored data in AWS S3 buckets via the Workbench is of particular relevance when submitting jobs to run on a CLC Genomics Cloud setup making use of functionality provided by the CLC Cloud Module.

When launching workflows to run locally using on-the-fly import and selecting files from AWS S3, the files selected are first downloaded to a temporary folder and are subsequently imported.

All traffic to and from AWS is encrypted using a minimum of TLS version 1.2.

Configuring access to AWS resources

To configure an AWS connection, go to:

        Connections | AWS Connections (Image cloud_access_16_n_p)

Click on the Add AWS Connection button to configure an AWS connection.

Already configured AWS connections and their status, and public buckets are listed (figure 6.4). Editing or removal of these configurations is done from here.

Image aws_connection_dialog
Figure 6.4: The configuration dialog for AWS connections. Here, two valid AWS connections, their status, and a public S3 bucket are listed.

To add a public bucket, click on the Add Public S3 button and provide the public bucket name (figure 6.5).

Image aws_connection_configure_public_bucket
Figure 6.5: Provide a public AWS S3 bucket name to enable access to data in that public bucket.

To configure a new AWS Connection, enter the following information (figure 6.6):

The dialog continuously validates the settings entered. When they are valid, the Status box will contain the text "Valid" and a green icon will be shown. Click on OK to save the settings.

Image aws_connection_configure
Figure 6.6: Configuration of an AWS Connection in a CLC Workbench

AWS credentials entered are stored, obfuscated, in Workbench user configuration files.

AWS connection status is indicated using colors. Green indicates the connection is valid and ready for use. Connections to a CLC Genomics Cloud are indicated in the CGC column (figure 6.4). To submit analyses to the CLC Genomics Cloud, the CLC Cloud Module must be installed and a license for that module must be available.

Importing data from AWS S3

AWS S3 buckets for each AWS Connection and public S3 bucket configured are available in the workflow launch wizard when using on-the-fly import in workflows, and in relevant import tool wizards (figure 6.7).

Image import_from_aws_location
Figure 6.7: Files in local or remote locations can be selected for import by the Illumina importer of the CLC Genomics Workbench.

AWS S3 buckets can be browsed using functionality added when the CLC Cloud Module is installed. See https://resources.qiagenbioinformatics.com/manuals/clccloudmodule/current/index.php?manual=Working_with_AWS_S3_using_Remote_Files_tab.html for details.

Exporting data to AWS S3

To export data to an AWS S3 bucket, launch the exporter, and when prompted for an export location, select the relevant option from the drop-down menu (figure 6.8).

Image export_to_aws_location
Figure 6.8: After an AWS connection is selected when exporting, you can select the S3 bucket and location within that bucket to export to.