HPC CLUSTER UPGRADES
HPC Cluster Upgrades for Additional Capabilities and Power
HPC Cluster Upgrades are for new or existing customers that request additions to their current cluster in order to add additional capabilities or computing power. Many customers also request cluster software upgrades to bring their systems up to date with more modern distributions and utilities.
HPC Cluster Upgrades for Hardware
HPC Cluster Upgrades from Aspen Systems includes a sales engineer that will work with you to determine your expanded requirements based on the addition of your new hardware. Your network and high speed HPC interconnects may not have enough ports to accommodate the addition of the new nodes, and additional power connections may also be needed. If rack space is not available, your upgrade may include additional racks as well. The additional heat load caused by your additional hardware will be calculated and provided to you for facility review. These and other infrastructure upgrades necessary to accommodate your additional nodes will be included in your upgrade quote package.
Adding additional nodes to an Aspen Systems cluster is easy. If your new nodes are identical to your current node configuration, a simple set of commands on your existing administrative node is all that is needed to add your new nodes. Aspen Systems clusters are designed to support these easy additions on either diskless or diskful clusters.
If the nodes are significantly different, a little more must be done. The motherboard, chassis, and even processors of your new nodes may have changed since you purchased your original cluster, so a newer or patched kernel might be necessary to utilize the newer hardware. Aspen Systems will take a current image of a cluster node using our Aspen Systems Utilities package. If we do not have remote access, we utilize a current image you take or our archived disaster recovery image for that cluster. Then we modify that node image to boot and function on your new hardware. Aspen Systems tools are used to generate initrd or initramfs images that will successfully boot both the old and the new hardware. This is done to normalize the node image for future use. If your cluster is a single image cluster, your image must be modified to be compatible with the newer hardware.
These type of HPC cluster upgrades maintain compatibility with system libraries and utilities that are already installed on your current cluster and being used for your current codes, and ensures a smooth upgrade path.
If your cluster was built by another vendor, things become a bit more complex as well. We will require access to your system to take a current image as well as to check your current software configuration and compatibility against your new hardware. If remote access is not allowed, we can talk you through taking an image using our software, and transferring that image back to Aspen Systems.
In some cases, additional software changes may be required to add your new nodes to your scheduler, extend your preferred authentication schema to allow the new nodes to be utilized in the cluster, configure monitoring and correction services, and adjust other cluster configurations. All of these changes can be accomplished by an Aspen Systems engineer with remote access or on-site as the new nodes are installed.
Once your new nodes are installed and all software changes are accomplished, regression testing is performed to ensure new node functionality and compatibility with existing infrastructure.
HPC Cluster Upgrades for Software
Many customers have their cluster software upgraded so that they can utilize more modern kernels, distributions, and utilities. Perhaps you now need to run a new model that will not compile or operate correctly on your current software stack, or you wish some feature that is not available on your installed configuration. Existing customers sometimes request this upgrade at the same time they add new hardware to their systems, while new customers may wish to enjoy the flexibility and reliability of an Aspen Systems tuned software stack on their existing hardware. Aspen Systems supports upgrading cluster hardware purchased from other vendors. These upgrades are handled on a case by case basis, and are documented in our orphaned cluster support section.
How We Perform Software Upgrades
Aspen Systems can deploy an engineer to your site to perform the upgrade. The engineer will arrive with a baseline image that he or she can then deploy from their laptop. Normally the upgrade images are contained on the engineers laptop, and the laptop can be used to PXE boot the entire cluster and perform an upgrade. If the cluster is in a classified environment and laptops are not allowed, the engineer may utilize DVD/CD-ROM media to upgrade the master node first. After the upgrade is performed, the Aspen Systems engineer will localize the cluster to operate in your environment and assist your end-users in getting codes and applications running in the new environment.
In many cases where the complexity of the upgrade is not high, remote access is not allowed, and extended downtime is acceptable, you can ship a single master node and one or more nodes of each discrete node type back to Aspen Systems for an upgrade at our engineering facility. You will be asked to perform remote testing and system certification and approve the baseline configuration prior to the units being shipped back to your location. After the upgraded units are returned and re-installed, Aspen Systems engineers will use you or other local site resources as eyes and hands to upgrade the remainder of your cluster.
This is the least desirable option, and we do not often recommend it. In this case, Aspen Systems will walk you through upgrading the master with an image we provide. We must have remote access to your cluster to even consider this option, and many customers find that the level of technical interaction and time required of them is undesirable.
A cluster software upgrade is essentially an entire software rebuild of your cluster environment. Depending on your storage configuration, it may be necessary to archive all user data onto other media, although in most cases your important data resides on a partition or RAID that can be disconnected and later re-connected. Aspen Systems will take a full O.S. image of each discrete node type to be upgraded, excluding your user data, then install and localize your new distribution and utilities on a single unit of each node type in your cluster. Once the basic functionality of these upgraded nodes has been verified, new images are made of these nodes, and deployed to all the other nodes in your cluster. Regression testing is then performed across the entire cluster to verify baseline functionality.
In almost all cases, your codes must be re-compiled or applications re-installed to operate correctly in your new environment. Once your new environment is stable, user data is restored or re-connected and user testing commences. If desired, Aspen Systems can familiarize your users to their new environment and help them rebuild codes and re-install applications.