Distributions

What Linux Distributions can be used?

 

Unlike many cluster vendors who offer only one particular distribution for their cluster offerings, Aspen Systems supports several different distributions. You may have a site support contract for a particular commercial distribution, or perhaps your systems administrators are more familiar with one specific distribution. Those are valid reasons to select a particular distribution. However, each distribution may introduce limitations in the capabilities of your cluster. Perhaps your unmodified distribution of choice doesn't support the hardware you have selected for your cluster, or a particular application has not been ported to your distribution. As always, consult your software vendor or user group for supported distributions and versions and talk to your Aspen Sales Engineer about our offerings. Aspen Systems recommends the CentOS distribution for most cluster uses due to its wide use in clustering, long support life cycle, wide user base, and the large number of HPC utilities it provides.

 

Aspen Supported Distributions

 

 

Some distributions, such as those targeted more toward general user desktops, are not well supported in the HPC community. If you can't easily find or read about another cluster running your code(s) on a given distribution, that may indicate that the distribution is unsuitable for your purposes. Aspen can integrate other distributions than those shown above, but another distribution may not have all of the functionality we normally provide. Should you select an unsupported distribution, Aspen will not guarantee that all the capabilities outlined in our standard build will work.

 

The update period, or patch life cycle, is extremely important for Linux distributions deployed in an Enterprise environment, but is less of an issue with clusters. A cluster does not normally need to be managed as you would the same number of individual enterprise nodes. Only the master node and any specialty nodes are treated as unique nodes for update purposes. The compute nodes are normally treated as a single upgrade target, and Aspen provides tools on our default distributions to help you do that. Many HPC users stabilize their system(s) on a distribution and optimized code-base which changes very little throughout the life cycle of the system.

 

Do you want to use diskless or single image compute nodes?

 

You may also choose a single image cluster, where only a single image is kept on the master node of the cluster, and all compute nodes network boot that image. Aspen Systems supports the Perceus / Warewulf clustering tool kit installed on CentOS or RedHat Enterprise, as well as stateless boot (NFS root) on RedHat derived distributions. Some training is necessary to utilize a single image system, and there can be limitations to your configuration flexibility in return for scalability.

 

With Perceus / Warewulf, no operating system is installed on the disks in your compute nodes. In this configuration, all user data space is contained on the master node or a storage node, and network mounted to the compute nodes. Perceus / Warewulf can be configured with no hard drives at all in the compute nodes (diskless) , or with local disks in the compute nodes (hybrid). When no node disks are installed, you must ensure that all your codes are memory disciplined, and can execute in the physical memory you have installed in the compute nodes with no swap space needed. Some physical memory is taken by the running operating system in a diskless compute node as well.

 

If local disks are used in a hybrid configuration, they are normally configured with swap and scratch space on the node hard drive. Some applications, Gaussian for instance, need and use local scratch space on each node to speed up calculations and reduce network traffic. Some of the advantages of a single-image system are configuration consistency and ease of expansion. Any change you make in the node image environment on the master can be deployed to your entire cluster by a simple reboot of all your nodes. You have to learn a few more commands to work in the node image environment, but the commands are relatively easy.

 

Adding additional nodes of the exact same hardware configuration is a simple matter of installing the new hardware, connecting the new nodes to your network, and booting them in the order you want them to be identified as additional nodes. Adding additional nodes with different motherboards and/or different processors, not an uncommon occurrence in cluster upgrades, will almost always require additional and possibly extensive modifications to the node image.

 

One of the advantages of a diskless compute node cluster is higher reliability due to the lack of disks in the compute nodes. Approximately 50% of all node failures are caused by failing hard drives. Configuring your cluster as a hybrid cluster, which has local disks used for local scratch, required by many applications, eliminates this advantage.

 

Some of the disadvantages of a single-image system is a certain lack of flexibility, some configuration complexity, and a slightly less user friendly and more advanced user experience. Perceus / Warewulf clusters are always configured in certain ways, naming is preset, and it is critical that the master node “own” the internal network used by the compute nodes, meaning that only cluster internal nodes should be connected to that network. It might be difficult to add specific applications or utilities in the node images, and some knowledge of kernel modules and boot sequences becomes necessary in more customized environments. Documentation is geared toward the more advanced cluster user, sometimes making it difficult to troubleshoot problems. Aspen clusters are equipped with command line utilities, or a GUI if the Aspen Beowulf Cluster Management System (ABC) is purchased, that allow you to copy one disked node then quickly re-image all other disked nodes with that copied image. These utilities are installed on all our default distribution selections, and work exactly the same across all environments. This option gives you configuration simplicity, ease of customization, and ease of expansion while using more traditional Linux administration skills you may already have. For many users, this option is more cost efficient and productive than deploying a single image cluster.

 

Only procure a diskless single image cluster if:

  • you intend to scale to a large number of nodes on this cluster
  • you know that your code(s) will easily reside in physical memory w/o accessing swap space
  • you have few or no site specific or unusual requirements for the configuration of the cluster
  • you have no current or planned applications that need to use node local scratch space
  • you have or intend to have slightly more advanced Linux HPC administration skills in your organization

 

Only use a hybrid single image cluster if:

  • you intend to scale to a large number of nodes on this cluster
  • you have few or no site specific or unusual requirements for the configuration of the cluster
  • you have or intend to have slightly more advanced Linux HPC administration skills in your organization

 

Use a disked cluster if:

  • you have site specific or unusual requirements for the configuration of your cluster which might require special host naming, network configurations, or unique node configurations.
  • you do not have advanced Linux HPC skills in your organization, or you intend to contract Aspen to administer your cluster

 

Do you want to use OpenMosix, Rocks, OSCAR, or OpenSSI?

 

The OpenMosix project ended on March 8th, 2008. OpenSSI supports older versions of Fedora, Debian, and RedHat 9, which may not have device drivers for newer hardware in todays clusters. OpenSSI plans to add 64 bit support soon, and plans to support CentOS and Red Hat Enterprise Linux 5 in the future, check the OpenSSI web site for up to date information.

 

OSCAR supports Fedora Core 4 and 5, RedHat Enterprise Linux 4, CentOS 4. The latest release was on November 12, 2006, although Version 5.2 is now in alpha and ported to the debian distribution. Check the OSCAR website for up to date information.

 

The Rocks Cluster Distribution (originally called NPACI Rocks) is a popular open-source Linux cluster distribution based on CentOS, and sponsored by an National Science Foundation award. Rocks is a disked cluster deployment and management solution, and utilizes the concept of “rolls”, which are pre-configured sets of RedHat Package Manager (RPM) packages with specific changes made to integrate into a Rocks cluster. The Rocks goal is to simplify building a cluster, and it succeeds. However, Rocks, much like Perceus / Warewulf, makes specific assumptions about how your cluster will be configured, and your cluster will be configured in that manner if it is to operate properly. Additionally rolls released by vendors or user groups, may be valid for only certain Rocks versions, and some rolls can conflict with other rolls, so some knowledge is necessary to successfully build and deploy a Rocks solution that fits your needs. As with all in-progress development efforts, bugs exist. Newer hardware and driver requirements can also require customization of the Rocks images, and make deployment of older Rocks versions, perhaps necessitated by roll compatibility, more difficult on the latest hardware configurations. There are two very good reasons to select Rocks.

 

  1. First, you may belong to a specific user community which has standardized on Rocks. For instance, the Rocks “bio” roll contains a suite of bio-informatics applications most commonly in use by the bio-informatics community, such as MpiBLAST, Emboss, Glimmer, HMMER, and NCBI BLAST. If your community routinely uses Rocks to satisfy its HPC needs, then you will most likely have already heard of Rocks clusters being used with your applications when you researched your applications requirements.
  2. Secondly, your organization may have standardized on Rocks, and already have specific administration experience with it.

 

Rocks clusters, as any deployed cluster management solution, rely on standardization, and some customizations may be very difficult or failure oriented. For instance, the standard solution to renaming your cluster master node host name is to re-install the cluster from scratch. Rocks requires very specific directory structures, and almost all rolls are configured in standard ways that may or may not meet your approval.

 

Specific information, such as the permanent IP address and fully qualified domain name of the master node, is necessary to know before we start building your Rocks cluster. If you require a configuration that is unusual to Rocks, or have unique organization or site-specific requirements, we may charge you extra to implement those customizations based on the complexity of your request. Some vendor Interconnects and specific HPC utilities may not be supported, and we, and you, will need to carefully research your roll selection, paying special attention to the version of Rocks your roll selections support. Speak to your Aspen Sales engineer for more information.

 

Some of the advantages to Rocks is its ability to quickly add additional nodes of the same hardware configuration to your cluster or re-image existing nodes, and the availability of packaged HPC applications (rolls) for specific user communities.

 

Rocks images nodes via RPM packages and kick start scripts, so any node customizations must be scripted in order to be present in any new image. The Aspen utilities utilize an actual node image, which contains all the customization that had been done to that node previously, changing only the IP address and host name. This means that a standard disked image cluster using Aspen utilities is more easily customizable than Rocks and just as scalable.

 


<< Previous | Next >>


Bookmark and Share