Schedulers

Do you need a scheduler? If so, which one?

 

Aspen highly recommends that you have us install and configure a resource manager / scheduler on your cluster, even if you do not currently use one. Schedulers and resource managers are used within HPC clusters to automate your application execution and allocate cluster resources between different users and groups. Resource managers normally handle knowing what the current state of the cluster is, what resources are currently being used, and asks the scheduler portion to decide how those resources will be used based on rules or policies defined by the cluster administrators. Some utilities combine both functions into one software suite, while others separate the resource manager and scheduler into different applications. To add to this confusion, the term “batch queuing system” is also sometimes used to describe these utilities.

 

Users on your cluster can submit applications to a queuing system. The scheduler will determine whether resources (nodes, memory, CPUs, interconnect, or any other property) are free on your cluster that can successfully execute the application now. If so, the application is ran; if not, the application is queued for later execution.

 

All schedulers and resource managers provide command line tools to submit, control, and review job status. Some provide web pages, X graphical user interfaces (GUIs), or even remote clients which can be used to perform those same tasks.

 

The application output is directed to where the user specifies, perhaps into a user specified file in a specific directory, or into standard output and captured by the resource manager. If e-mail is configured and operational on your cluster, an e-mail can be sent to the user notifying them of the status of their job.

 

Use of a scheduler and resource manager on your cluster can greatly increase your application productivity by queuing jobs that will be ran as soon as resources allow, not when the users get around to running a job. The scheduler system allows you to utilize your cluster much more efficiently while removing the need for users to be available to start their applications interactively.

 

Specific open source projects and applications have been written that utilize or must have specific scheduler installations. FMRIB Software Library (FSL), for example, is a brain imaging analysis suite distributed by Oxford. FSL has been written to utilize the Sun Grid Engine (SGE) scheduler, and will not function in a cluster environment without that scheduler installed and operational.

 

Check to see if your particular application requires or interfaces with any particular scheduler, as that may drive your selection.

 

Aspen can install and configure several different resource manager and scheduler combinations on your cluster. Some are open source and no charge to you, while some are commercial products which you must purchase. Aspen can procure and install these utilities on your cluster for you, or transfer licenses from existing licenses you might have.

 

 

Torque/Maui

 

The Torque Resource Manager and Maui scheduler are open source projects maintained by Cluster Resources. This is a very popular scheduling solution which scales well to large clusters and provides all the basic scheduling requirements normally needed on a cluster. The Aspen ABC suite connects to Torque to allow your cluster users to submit jobs and review job status via a web GUI if that is desirable, and an X Windows GUI program called “xpbsmon” can be used to gain a graphical view of your current cluster usage. Specific node allocation and other policies can be set by the Maui scheduler.

 

 

Moab

 

Aspen recommends the Moab Cluster Suite© for more complex scheduling needs. Moab is a commercial product supported by Cluster Resources, and runs in conjunction with Torque ,replacing the Maui scheduler. Moab provides simple web based job management, graphical cluster administration, and management reporting tools as well as remote clients which can be used on Windows, Linux, or other Unix systems. Some clusters must support multiple users who each have differing resource requirements. Moab can be used to simplify the administrative overhead of larger clusters that are oversubscribed or have many different departments within an organization who might compete for resources on your cluster.

 

 

Sun Grid Engine (SGE)

 

Sun Grid Engine is a scheduler project supported by Sun Microsystems that is used by many cluster communities for scheduling and resource management. SGE supports all the basic scheduling and resource management needs just as Torque/Maui do, but also supports usage accounting and reporting and advanced scheduling algorithms much as Moab does. SGE provides an X Windows GUI for scheduler configuration, job submission, and job status, and is available both as an open source version and as a supported commercially licensed product.

 

 

SLURM (Simple Linux Utility for Resource Management)

 

SLURM is an open-source resource manager used on many Linux clusters. SLURM is not a sophisticated batching system, but does provide an interfaces to the Maui scheduler and Moab Cluster Suite© . Some user communities rely on SLURM for their batching needs, and SLURM is used on some larger clusters.

 

 

PBS Pro

 

PBS Pro is a commercial grid and cluster resource manager offered by Altair Engineering. PBS Pro excels at connecting different clusters and workstations across your organization into a cohesive managed application execution environment.

 

 

Platform LSF

 

Platform LSF is a resource management and scheduling suite offered by Platform Computing, Inc.. Platform LSF is arguably the most widely deployed commercial batch processing implementation on some of the larger clusters. Platform LSF is also available across your entire infrastructure, including workstations and clusters, and has been integrated with many commercial HPC applications.

 


<< Previous | Next >>


Bookmark and Share