One of the key components to manage an HPC Cluster is to have the right HPC Management software in place. This includes methods to deploy compute nodes, keep operating systems and other software up to date, and monitor the hardware. There are full Cluster Managers available. Some are free, including Aspen Systems’ Aspen Cluster Management Environment (ACME) software, and others come with commercial support and require a license. Then, there are different components you can use to create your own software stack. A lot of these tools are available as Open Source software.
The software stack is perhaps the most important part of your high performance computing solution. Starting with your choice of operating system, the software stack determines not only how your system operates, but also its performance.
Unlike most other HPC manufacturers, Aspen Systems offers a full selection of operating systems for you to choose from. Some OS’s are more user-friendly, while other may provide increased performance for your applications. You also may already be familiar with a particular Linux distribution, so sticking with it may be the best choice for you depending on the hardware selected.
Equally as important as the distribution are the HPC Compilers & MPIs
- Compiling your source code using commercial compilers such as Intel will most likely lead to significant performance increases.
- If you have a high speed interconnect such as InfiniBand, then compiling your cluster’s MPIs with a commercial compiler and the performance communications libraries will be of great benefit.
- If your cluster contains GPU processors or FPGAs then using a custom compiler is imperative towards achieving optimal performance.
Aspen offers a full selection of performance software options such as compiling your choice of MPIs and other software with as many compilers as you wish before your system ships. Aspen requires all customers to fill out our online Statement of Work (SOW).
Aspen Systems Cluster Management
Cluster HPC management and support is perhaps one of the most overlooked facets of operating a cluster. Two questions must be answered for your successful cluster deployment. What hardware and software capabilities will be installed on your cluster to facilitate successful HPC management and support; and what are your cluster management, warranty, and support options?
Aspen Systems Cluster Management software comes standard with all of our HPC Clusters, along with our Standard Service Package at no additional cost. Aspen Cluster HPC Management software is compatible with most Linux distributions and is supported for the life of the cluster.
- Node Provisioning – Aspen Cluster Maintenance Environment (ACME) is a network bootable Linux environment independent of the environment installed on a cluster node which is used for deploying images across your cluster, testing and pre-configuration of cluster nodes, and stress testing. Images are created using ‘aspencopy’ and deployed through ACME using ‘aspenrestore’.
- Aspen Tools – Aspen provides command line tools on our clusters for imaging, remote power, sensor programs, and more. These are often used by more advanced cluster users to quickly check status on nodes, remotely power them on or off, or to re-image large groups of nodes.
- Scheduler – Aspen can install and configure several different resource manager and scheduler combinations on your cluster. Some are open source and no charge to you, while some are commercial products which you must purchase. Aspen can procure and install these utilities on your cluster for you, or transfer licenses from existing licenses you might have.
- Environment Modules – Aspen installs Environment Modules on all of our HPC Clusters. Modules allow users to dynamically modify their environment via modulefiles., and useful in managing different versions of applications (MPI, Compilers). Modules can also be bundled into metamodules that will load an entire suite of different applications.
- Ganglia – Aspen normally installs and configures Ganglia on your cluster, and can make Ganglia externally available as a default web page for organizations who are used to seeing Ganglia as the front end web page for their clusters. Ganglia is a quite popular scalable distributed monitoring system for clusters and grids, and many HPC customers do not consider a cluster complete without it.
- Monitoring – Aspen can configure your cluster with several monitoring tools to help you and your support team get the most value out of your technology investment. Nearly all aspects of your cluster can be monitored, including performance/utilization, network saturation, power consumption, temperature monitoring and more.
Bright Cluster Manager
Bright Computing is an industry leader in HPC middleware solutions, for provisioning and managing HPC clusters, Hadoop clusters, and OpenStack private clouds in your data center or in the cloud. Bright Cluster Manager, the flagship product of Bright Computing, makes it easy to deploy and manage big data and cloud architectures. Bright Cluster Manager makes Linux clusters easy to install, manage and use. In addition to ease of management, Bright Cluster Manager is designed to scale to thousands of nodes. The Bright Cluster Manager software solution is designed to be a complete HPC management solution and includes everything a user or system administrator would expect from an advanced cluster management software stack. Contact one of our expert sales engineers today to learn how the HPC solutions from Bright Computing can help you streamline the installation and management of your HPC system.
Open-Source Toolkit for Real and Virtual Clusters
Rocks is an open-source Linux cluster distribution that enables end users to easily build computational clusters, grid endpoints and visualization tiled-display walls. Hundreds of researchers from around the world have used Rocks to deploy their own cluster. With its role based package manager, deploying applications across the entire cluster is very easy and efficient.
The Rocks Cluster Distribution (originally called NPACI Rocks) is a popular open-source Linux cluster distribution based on CentOS, and sponsored by an National Science Foundation award. Rocks is a disked cluster deployment and management solution, and utilizes the concept of “rolls”, which are pre-configured sets of RedHat Package Manager (RPM) packages with specific changes made to integrate into a Rocks cluster. The Rocks goal is to simplify building a cluster, and it succeeds. However, Rocks, makes specific assumptions about how your cluster will be configured, and your cluster will be configured in that manner if it is to operate properly. Additionally rolls released by vendors or user groups, may be valid for only certain Rocks versions, and some rolls can conflict with other rolls, so some knowledge is necessary to successfully build and deploy a Rocks solution that fits your needs.
Diskless Computing Made Easy
oneSIS is an open-source software package aimed at simplifying diskless cluster management. It is a simple and highly flexible method for deploying and managing a system image for diskless systems that can turn any supported Linux distribution into a master image capable of being used in a diskless environment. One image is sufficient for serving thousands of nodes. Functional groups of nodes are easy to define, and any single node or group of nodes can easily be configured to behave independently.
Configuration is simple.
All node differences are defined in a central configuration file, providing unprecedented simplicity and clarity for system administrators. oneSIS can be used to manage diskless systems using NFS root, and potentially root over any other network filesystem or network storage system (such as iSCSI, iSER, SRP, Fiber Channel). It can be used to manage the root filesystem in any kind of diskless environment from desktops to high availability web servers to high performance compute clusters.
Scalable, Modular, Adaptable Systems Management
Warewulf is a scalable systems management suite originally developed to manage large high-performance Linux clusters. Focused on general scalable systems management, it includes a framework for system configuration, management, provisioning/installation, monitoring, event notification, and more via a modular plugin architecture. Install the components and features you need or leverage the existing system configurations stored within Warewulf to create custom solutions to meet your particular needs. Warewulf is a flexible solution that has proven itself to be scalable and easy to use.
Extreme Cluster/Cloud Administration Toolkit
xCAT offers complete management for HPC clusters, RenderFarms, Grids, WebFarms, Online Gaming Infrastructure, Clouds, Datacenters, and whatever tomorrow’s buzzwords may be. It is agile, extensible, and based on years of system administration best practices and experience.
- Provision Operating Systems on physical or virtual machines: RHEL, CentOS, Fedora, SLES, Ubuntu, AIX, Windows, VMWare, KVM, PowerVM, PowerKVM and zVM
- Provision using scripted install, stateless, statelite, iSCSI, or cloning
- Remotely manage systems: lights-out management, remote console, and distributed shell support
- Quickly configure and control management node services: DNS, HTTP, DHCP, TFTP and NFS