Using locally held data for analyses on AWS S3
Two considerations when using data stored locally for analysis on AWS S3 are:
- Transfer of data from a non-cloud location, such as from CLC Locations, from local directories, or from CLC Server import/export directories, takes time, and when submitting jobs from the CLC Workbench, any necessary data transfer must be complete before the software can be closed.
- Data uploaded to AWS S3 as part of job submission is not saved to AWS S3 for subsequent use. Later analyses using the same input data would involve another transfer of that data to the cloud.
Cases where locally held data must be used:
- Launching a tool from the CLC Workbench Toolbox to run on a CLC Genomics Cloud6.2. This tool will be wrapped in a workflow for you, before the job is sent to the cloud. When launching analyses this way, you can only select data held in CLC Locations. Thus, the first step when a tool has been launched will always be to upload the input data to AWS S3. Launching tools can be useful when running small tests, but for analysis of large data sets, or where data needs to be imported, we recommend creating a workflow containing the tool of interest and launching that instead.
- When specifying reference data elements in certain workflow designs, as described in Reference data for analyses on the cloud
- Running analyses that require a CLC Metadata Table with data associated to it using a workflow that does not include an Iterate element. An example would be a workflow with the Differential Expression tool at the top of the workflow.
Workflows containing Differential Expression can be run using data already on AWS S3 when that workflow contains an Iterate element upstream. An example illustrating this is the RNA-Seq and Differential Gene Expression Analysis template workflow, delivered with the CLC Genomics Workbench, described at https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=RNA_Seq_Differential_Gene_Expression_Analysis_workflow.html . If the batch units for the the RNA-Seq analysis section of that workflow are defined using metadata, that same metadata will be used as input to the Differential Expression step.
Footnotes
- ... Cloud6.2
- Only tools that can be used within workflows can be submitted to run on a CLC Genomics Cloud. Tools that are not workflow-enabled cannot be run on the cloud.