Importing data on the fly
There are two ways that raw data, i.e. data not already imported into the CLC software, can be imported as part of a workflow run:
- Include an Input element in the workflow design, and when launching the workflow, choose the option "Select files for import". This is referred to as "on-the-fly" import.
- Include a dedicated Import element in the workflow design.
Examples of these 2 design types are shown in figure 12.30. How these translate when launching the workflow is shown in figure 12.31. The relative merits of each option are outlined in table 12.1. For most uses, on-the-fly import will be the most versatile option.
Figure 12.30: Raw data can imported as part of a workflow run in 2 ways. Left: Include an Input element. and use on-the-fly import. Right: Use a specific Import element. Here, the Illumina import element was included.
Figure 12.31: Top: Launching a workflow with an Input element and choosing to select files to import on-the-fly. Bottom: Launching a workfow with a dedicated import element, in this case, an Illumina import element.
|
Notes:
- Modified copies of imported data elements can be saved, no matter which of the import routes is chosen. For example, an Output element attached to a downstream Trim Reads element would result in Sequence Lists containing trimmed reads being saved.
- The use of Iterate elements to run all or part of a workflow in batches is described in Running part of a workflow multiple times.
- Configuration options for NGS importers are described in Import high-throughput sequencing data12.3.
Footnotes
- ...sec:importNGS12.3
- Paired read handling for workflows launched in batch mode, or workflows with Iterate elements, is the same as for the importer tools themselves: If the Paired option is checked, files are handled as described in the manual section on NGS importers. In CLC Genomics Workbench21.x, this was also the case in most circumstances. However, if batch units were based on data organization and all files were in the same folder, each file was treated as a separate batch unit irrespective of whether the Paired option was checked.