Controlling the number of cores utilized

In order to configure core usage, the native specification of the grid preset needs to be properly configured. This configuration depends on the grid system used. From version 4.01 of the CLC Genomics Server, all cores on an execution node will be used by default. Unless otherwise configured to limit the number of cores used for a job involving assembly or read mapping phases, a dedicated queue must then be setup, which only schedules a single job on any given machine at a time. Otherwise your CLC jobs may conflict with others running on the same execution host at the same time.

Configuration of OGE/SGE

1) CPU Core usage when not using parallel environment

By default the CLC Genomics Servers ignores the number of slots assigned to a grid job, and utilizes all cores of the execution host. That is, jobs will run on all cores of a execution host.

As of version 4.01 of the CLC Genomics Server, there is an environmental variable, which, when set to 1, will specify that the number of allocated slots should be interpreted as the maximum number of cores a job should be run on. To set this environmental variable, add the following to the native specification of the grid preset:

-v CLC_USE_OGE_SLOTS_AS_CORES=1

In this case, the number of utilized cores is equal to the number of slots allocated by OGE for the job.

2) Limiting CPU core usage by utilizing parallel environment

The parallel environment feature can be used to limit the number of cores used by the CLC Genomics Server, when running jobs on the grid. The syntax in the native specification for using parallel environments is:

-pe $PE_NAME $MIN_CORE-$MAX_CORE

When the parallel environments feature is used, the number of allocated slots is interpreted as the number of cores to be used. That is, the number of utilized cores is equal to the number of slots in this case.

The parallel environment, selected by its name, must be setup by the grid administrator (documentation provided by Oracle will cover this subject area), in such a way that the number of slots corresponds to the number of cores. $MIN_CORE and $MAX_CORE specify a range of cores, which the jobs submitted through this grid preset can run under. Care must be taken not to set $MIN_CORE too high, as the job might never be run (e.g. if there is no system with that many cores available), and the submitting user will not be warned by this fact.

An example of a native specification using parallel environments is the following:

-l cfl=1 -l qname=32bit -pe clc 1-3.

Here, the clc parallel environment is selected, and 1 to 3 cores are requested.

Older versions of the CLC Genomics Server

CLC Genomics Server version 4.0 and older utilize CPU cores equal to the number of allocated slots, unless a parallel environment is in use, in which case the behaviour is the same as described previously. In many situations the number of allocated slots is 1, effectively resulting in CLC jobs running on one core only.

Configuration of PBS Pro

With PBS Pro it is not possible to specify a range of cores (at least not to our knowledge). Here one specifies exactly how many cores are needed. This request can be granted (the process is scheduled) or denied (the process is not scheduled). It is thus very important to choose a realistic number. The number of cores are requested as a resource: -l nodes=1:ppn=X, where X is the number of cores. As this resource is also designed to work with parallel system, the number of nodes is allowed to be larger than 1. For the sake of scheduling cores, it is vital that this parameter is kept at 1. An example of a native specification is: -q bit32 -l nodes=1:ppn=2, which will request two cores and be queued in the bit32 queue.