Using locally held data for analyses on AWS S3

Two considerations when using data stored locally for analysis on AWS S3 are:

  1. Transfer of data from a non-cloud location, such as from CLC Locations, from local directories, or from CLC Server import/export directories, takes time, and when submitting jobs from the CLC Workbench, any necessary data transfer must be complete before the software can be closed.
  2. Data uploaded to AWS S3 as part of job submission is not saved to AWS S3 for subsequent use. Later analyses using the same input data would involve another transfer of that data to the cloud.

Cases where locally held data must be used:

Workflows containing Differential Expression can be run using data already on AWS S3 when that workflow contains an Iterate element upstream. An example illustrating this is the RNA-Seq and Differential Gene Expression Analysis template workflow, delivered with the CLC Genomics Workbench, described at https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=RNA_Seq_Differential_Gene_Expression_Analysis_workflow.html . If the batch units for the the RNA-Seq analysis section of that workflow are defined using metadata, that same metadata will be used as input to the Differential Expression step.



Footnotes

... Cloud6.2
Only tools that can be used within workflows can be submitted to run on a CLC Genomics Cloud. Tools that are not workflow-enabled cannot be run on the cloud.