Importing data on the fly

NGS sequence data can be imported on the fly, as an initial action when a workflow is run, avoiding the need import the data prior to launching the workflow. There are two ways that this can be done:

The two choices are illustrated in figure 11.35.

Image workflow_import_on_launch
Figure 11.34: Files containing NGS sequence data can be selected as workflow inputs. Such data is imported on the fly when the workflow starts running.

Image workflow_input_elements_and_import
Figure 11.35: NGS sequence data can be imported as the intial action in a workflow using either of the workflows pictured. The choice of using on-the-fly import, supported by the workflow on the left, or a dedicated NGS import element, as on the right, should take into account the general differences described in the text.

Considerations when deciding between these two styles include:

Elements downstream of the initial import behave the same way however the import is done. So, for example, if you include a Trim Reads element in the workflow, you could add an output to its Trimmed Sequences output channel to save the sequence lists containing the trimmed data in both cases.

The configuration options available when importing NGS data are described in the Import chapter, specifically, Import high-throughput sequencing data. These options are displayed slightly differently in the workflow wizard compared to the import tool wizard, but how the settings affect the import is the same.

CLC format files can also be imported as an initial action when a workflow is run. This is most likely to be of interest when receiving data from other CLC systems to use for an analysis. When launching a workflow where the input elements involve data types other than sequence data, e.g. reads tracks, variant tracks, etc., the option to import from disk is still available, but here, only CLC format files are accepted at this time.