Choosing between Classic and Single entity options
Choosing the best queuing option involves consideration of the types of analyses being run and the system the jobs are being run on. The following are some considerations to help guide this decision.
Classic
The Classic option would be beneficial in the following situations:
- Workflows containing parallel branches are commonly launched. Elements of such Workflows can be scheduled on multiple nodes. This potential for parallel execution of Workflow steps can yield shorter average running times, depending on other aspects of the setup (e.g. node or grid license availability). Time savings are most noticeable on systems with spare capacity when running Workflows with computationally intensive elements on parallel branches.
- Single tools are predominantly submitted, as opposed to Workflows. There is no need to change the default setting in this case, as the Single entity option only affects the way Workflows are handled.
- Job node setups only: Nodes have been dedicated to certain types of analyses If a job node has been configured to run only certain tasks (Server commands) as described in section 6.2.3, then, with the Classic option, this node can be used for those configured tasks, whether or not they are elements of a Workflow. This is not the case with the Single entity option where Workflows cannot be run on the job nodes configured like this.
Single entity
The Single entity option would be beneficial when the execution of Workflows is common, and:
- Workflows being submitted often consist of large number of steps and each step has small computational requirements. This is common when working with data from organisms with small genomes, such as bacterial or viral samples, or when working with enriched data from organisms with large genomes. Where scheduling overhead outweighs the net analysis time of a given Workflow step, the Single entity setting can yield many-fold improvements in overall sample throughput.
- Nodes frequently run at capacity. By running an entire Workflow on a single node, the Single entity option can leverage caching mechanisms, aiding performance. When all nodes are busy much of the time, the opportunity for gains through concurrent processing of parallel Workflow branches on multiple nodes is much lower than on a system with spare capacity. On such a system, the Single entity option may thus yield overall performance gains, even when running Workflows containing elements with computationally intensive steps.
- The node hardware is homogeneous. All the nodes should be of a size that could handle all tasks in the Workflows being submitted.
- Resource allocation is a focus. On setups where many users are sharing the resources and are running Workflows, the Single entity option may help with resource access for different users. For example, consider a grid node setup with 20 nodes, where one user submits 15 Workflows with 10 tasks in each Workflow. With the Classic option, this would lead to 150 jobs, which can be sent across all 20 nodes. When the next user submits a job, it would be queued behind all of those 150 jobs from the first user. With the Single entity option, the 15 Workflows would have been submitted to 15 nodes, leaving 5 nodes available on which the other user's job could be run.
- Large numbers of Workflows are submitted during limited periods of time, each Workflow consisting of several or many tasks. Such a situation leads to thousands of jobs in the queue using the Classic option. Using the Single entity option is a way to keep the scheduling load on the master node with reasonable limits.
- Grid node setups only: The number of grid worker licenses is limited relative to the number of job submissions. Using the Single entity option, a Workflow is submitted as a single job and thus consumes a single grid worker license. If a Workflow has 10 steps and is submitted using the classic option, 10 jobs are created. Each of these jobs will consume a license, making license availability a limiting factor, along with node availability, for when jobs can be run.
- Grid node setups only: Resource tracking is a focus. Workflows involving many steps would run as a single job with a single license consumed by that job. This potentially decreases the complexity of resource use tracking of users running Workflows.