Supported grid scheduling systems
CLC officially supports the third party scheduling systems OGE, PBS Pro and IBM Platform LSF. We have tested the following versions:
- OGE 6.2u6
- PBS Pro is 11.0
- LSF 8.3 and 9.1
On a more general level:
- The grid integration in the CLC Science Server is done using DRMAA. Integrating with any submission system that provides a working DRMAA library should in theory be possible.
- The scheduling system must also provide some means of limiting the number of CLC jobs launched for execution so that when this number exceeds the number of CLC Grid Worker licenses, excess tasks are held in the queue until licenses are released. In LSF and OGE for example, the number of simultaneous CLC jobs sent for execution on the cluster can be controlled in this way by configuring a "Consumable Resource". This is decribed in more detail in section 6.2.5.
An example of a system that works for submitting CLC jobs, but which cannot be officially supported due to the second of the above points is PBS Torque. As far as we know, there is no way to limit the number of CLC jobs sent simultaneously to the cluster to match the number of CLC Grid Worker licenses. So, with PBS Torque, if you had three Grid Worker licenses, up to three jobs could be run simultaneously. However, if three jobs are already running and you launch a fourth job, then this fourth job will fail because there would be no license available for it.
This limitation can be overcome, allowing you to work with systems such as PBS Torque, if you control the job submission in some other way so the license number is not exceeded. One possible setup for this is if you have a one-node-runs-one-job setup. You could then set up a queue where jobs are only sent to a certain number of nodes, where that number matches the number of CLC Grid Worker licenses you have.