HPC Schedulers

HPC SCHEDULERS

Without a scheduler, an HPC Cluster would just be a bunch of servers with different jobs interfering with each other. When you have a large cluster and multiple users, each user doesn’t know which compute nodes and CPU cores to use, nor how much resources are available on each node. To solve this, cluster batch control systems are used to manage jobs on the system using HPC Schedulers. They are essential for sequentially queueing jobs, assigning priorities, distributing, parallelizing, suspending, killing, or otherwise controlling jobs cluster-wide. Below are some of the HPC schedulers commonly requested for Aspen Systems’ customers.

SLURM

The Simple Linux Utility for Resource Management (SLURM) is an open-source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

No single point of failure, fault-tolerant options, backup daemons
Highly scalable
Up to 1000 job submissions/second (600 executions/second)
Heterogeneous resources supported
Each job can have custom operating systems booted
Automatic job re-queue (policy configured based on exit value)

Highly configurable (over 100 plugins)
GNU General Public License
Reserve or limit resources for specific users
Real-time accounting down to the task level (identify tasks based on CPU or memory usage)
Account for power consumption per job
Report API use by user, and time consumed

Moab HPC Suite

Moab HPC Suite is a workload and resource orchestration platform that automates the scheduling, managing, monitoring and reporting of HPC workloads on massive scale. The patented Moab intelligence engine uses multi-dimensional policies and advanced future modeling to optimize workload start and run times on diverse resources.

Maui Cluster Scheduler

Maui is a highly optimized and configurable advanced job scheduler for use on clusters. It is capable of supporting a large array of scheduling policies, including dynamic priorities, extensive reservations, and fair-share, and also interfaces with numerous resource management systems. Maui improves the manageability and efficiency of machines ranging from servers of a few processors to multi-teraflop clusters.

TORQUE Resource Manager

TORQUE (Tera-scale Open-source Resource and QUEue manager) is a resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original PBS project and has incorporated significant advancements in the areas of scalability, fault tolerance, and feature extensions contributed by NCSA, OSC, USC, the U.S. Dept of Energy, Sandia, PNNL, University of Buffalo, TeraGrid, and many other leading-edge HPC organizations. TORQUE is fully supported by Moab Workload Manager and Maui Scheduler.

VIEWPOINT

Viewpoint is a rich, easy-to-use portal for end-users and administrators, designed to increase productivity through its visual web-based interface, powerful job management features and other workload functions. The portal provides greater self-sufficiency for end-users while reducing administrator overhead in High Performance Computing (HPC).

GRID ENGINE

When you move from network computing to grid computing, you will notice reduced costs, shorter time to market, increased quality and innovation and you will develop products you couldn’t before. Grid Computing solutions are ideal for compute-intensive industries such as scientific research, EDA, life sciences, MCAE, geosciences, financial services and others.

THE PATH TO SCALED-UP DATA CENTERS

Univa Grid Engine is a commercially supported and licensed software that is the leading distributed resource management system that optimizes resources in thousands of data centers by transparently selecting the resources that are best suited for each segment of work. Grid Engine software manages workloads automatically, maximises shared resources and accelerates deployment of any container, application or service in any technology environment, on-premise or in the cloud.

OPEN GRID SCHEDULER

Open Grid Scheduler/Grid Engine is a commercially supported open-source batch-queuing system for distributed resource management. OGS/GE is based on Sun Grid Engine, and maintained by the same group of external (i.e. non-Sun) developers who started contributing code since 2001.

The Son of Grid Engine is a community project to continue Sun’s old grid engine free software project that used to live at http://gridengine.sunsource.net after Oracle shut down the site and stopped contributing code. (Univa now owns the copyright) It will maintain copies of as much as possible/useful from the old site.

Watts
BTU/h
AMPs (110V)
AMPs (208V)

HPC Schedulers

HPC SCHEDULERS

SLURM

Moab HPC Suite

Maui Cluster Scheduler

TORQUE Resource Manager

VIEWPOINT

GRID ENGINE

THE PATH TO SCALED-UP DATA CENTERS

OPEN GRID SCHEDULER

Product Dimensions

Plug Types

Cooling

1. Determine the square footage of the area to be cooled

2. Determine your heat load

3. Make any adjustments for the following circumstances:

Solve Your IT Equipment Needs with Equipment Financing

Industry-Leading Finance Solutions

Finance Aspen Systems Product Offerings

Products

Services

Support

About Us