Master Node
What are your master node configuration options?

Your master node is one of the most critical systems in your cluster. Aspen provides tools on all our default distributions to rebuild all other nodes in the cluster from the master node, but a failed master could require manual configuration and involve some downtime if a fail-over master is not used. Aspen can configure your cluster with a fail-over master at additional cost. While the failover master option requires additional hardware as well as specialized software configurations to operate, many customers may find it to be cheap insurance to ensure that their cluster continues to function in any hardware failure situation.
The master operating system file systems should always be mirrored on two disks. RAID 1, or mirroring, can be done via a hardware RAID card, or by software raid. The master O.S. RAID will contain not only the distribution, configurations for all your cluster utilities, and source for your particular utilities or codes if applicable, but also the single image for network booting compute nodes (if a single image system is used), or multiple snapshots of node images that are used to restore or upgrade your compute nodes. This, combined with your data, is your cluster.
Cluster images can be extremely customized, containing site, facility, code, or performance specific modifications that Aspen, and you, have spent many hours completing. Figure 5 shows an average functional master node software stack.

Aspen automatically keeps an image of your cluster as it was shipped on secure storage at our facility, and can retrieve additional images at later dates, perhaps after upgrades or site customization, to facilitate disaster recovery should that be needed. Cluster images do not normally contain any application or model output data, so data needs to be backed up using some other mechanism.
If current images have been taken of your master node(s), a compute node, and any additional specialty nodes, and those images have been copied to secure storage at your site or at Aspen, we can always re-install your entire cluster should that be needed. Of course we don't wish to do that, and a properly configured master node will help ensure that that eventuality never occurs. Using the Aspen image is also an efficient way to implement additional clusters should you need them, as any customization we have done for you or any site localization or customization you have performed resides in these images.
If you have only a single master node, your master node should have redundant power supplies which are connected to different breakers, and if possible, those breakers should feed from different panels in your facility. If your single master has only one power supply, your entire cluster can be rendered inoperative by the failure of that power supply. If you have redundant power supplies but they all are connected to a single circuit breaker, your cluster can fail if that single circuit breaker becomes faulty. A single panel normally is powered by a single master breaker, so if at all possible, ensure your redundant supplies are fed from different electrical power panels in your facility.
Your master node in a small cluster will function as the interactive login node for all your cluster users, so additional memory is needed, especially if you expect to deploy Virtual Network Compute (VNC) servers for individual users, or large code compilations are performed. In a small cluster configuration where the master serves all front end functions, any performance degradation on the master can affect job execution on the entire cluster, so it is better to over specify the amount of memory in your single master node rather than the reverse.
The master node may be used to burn data sets onto Blu-ray, DVD, or CDROM, so a burning unit might be needed as well, and a single master can contain the data storage for the cluster, which we will discuss next.




