Importing data on the fly
NGS sequence data can be imported on the fly, as an initial action when a workflow is run, avoiding the need import the data prior to launching the workflow. There are two ways that this can be done:
- Choose the option "Select files for import" when launching a workflow, as shown in figure 11.34.
- Include an NGS import element in the workflow.
The two choices are illustrated in figure 11.35.
Figure 11.34: Files containing NGS sequence data can be selected as workflow inputs. Such data is imported on the fly when the workflow starts running.
Figure 11.35: NGS sequence data can be imported as the intial action in a workflow using either of the workflows pictured. The choice of using on-the-fly import, supported by the workflow on the left, or a dedicated NGS import element, as on the right, should take into account the general differences described in the text.
Considerations when deciding between these two styles include:
- Workflows using NGS import elements cannot be launched in batch mode. Workflows using the on-the-fly import functionality can be run in batch mode, as described in Running workflows in batch mode and Batching workflows with more than one input changing per run.
- If an output element is connected to an NGS import element in the workflow design, the imported data will be saved. The imported data is not saved when using the on-the-fly import functionality.
- NGS import workflow elements can be configured with new defaults, and the workflow author can decide which values should configurable when launching the workflow. By contrast, all options for a given NGS importer are configurable by workflow users when using the on-the-fly importer functionality.
- A particular NGS importer must be selected when adding a dedicated worklow import element. Using the on-the-fly import functionality, the person launching the workflow can choose from any of the NGS importers.
Elements downstream of the initial import behave the same way however the import is done. So, for example, if you include a Trim Reads element in the workflow, you could add an output to its Trimmed Sequences output channel to save the sequence lists containing the trimmed data in both cases.
The configuration options available when importing NGS data are described in the Import chapter, specifically, Import high-throughput sequencing data. These options are displayed slightly differently in the workflow wizard compared to the import tool wizard, but how the settings affect the import is the same.
In addition to NGS reads, Sanger trace files and CLC format files can also be imported as an initial action when a workflow is run. CLC format import is most likely to be of interest when receiving data from other CLC systems to use for an analysis.
When launching a workflow where the input elements involve data types other than sequence data, e.g. reads tracks, variant tracks, etc., the option to import from disk is still available, but here, only CLC format files are accepted at this time.