Cluster Logical Layout

How should your cluster be laid out (logically)?

 

Aspen builds, sells, and supports HPC, storage, database, visualization, and special use clusters. With certain exceptions, they all are configured much like HPC clusters, with some additional capabilities or different hardware. There are three general sizes of clusters, small, medium and large. Each type and size has specific uses and is often configured in similar ways.

 


Small Cluster

 

The “small” cluster has 32 or less nodes, and is usually used for either a small work-group or perhaps even a single user. One, or only a few codes are used on the cluster, and the problem size(s) the cluster is needed to solve are generally not that large.

 

The small cluster can have a low latency high-speed Interconnect (depending on application) and is serviced by a single front-end, or “master” node, which performs multiple functions for the cluster. The master node services interactive cluster user logins and perhaps user compilation needs, performs job scheduling for the entire cluster, can have the long term data storage and backup systems for cluster data, and performs systems monitoring and fault correction. The master node also firewalls the compute nodes from your organizational network.

 

Normally only the master node is visible to the network at your organization. The networks inside the cluster are normally set to private IP space, and used only for internal communications between the nodes. The master can also operate a Network Address Translation (NAT) gateway for the nodes, allowing them to access the outside world while not being visible themselves. Figure 1 illustrates a generic representation of a small cluster.

 

 


Medium Cluster

 

The “medium” cluster has between 32 and 256 nodes, and is usually used as a organizational resource, and many codes may be used on the cluster. At this size, a single master node may not be able to scale properly, so some functions may be removed from the master node and encapsulated into separate nodes, such as a dedicated storage node, login, compilation, display, or an administration node. A storage node is a dedicated data serving resource and may or may not be connected to the organizational network for file sharing purposes. An administrative node is used for cluster monitoring and fault correction. Login nodes remove the interactive login load generated by cluster users to a separate node, and dedicated compilation nodes might be used just for compilation of various codes or development purposes.

 

 

 

 

Any, or all of these disparate node types might be present in any cluster, depending on your specific needs, and variations of all of these configurations are possible. Some customers run codes that access data kept on external storage, so each node might need an external connection. Or perhaps a router, layer 3 switch, or gateway is used on a cluster internal network so that each node is directly reachable from your organizations computers. In some cases, a dedicated compilation node or nodes might be needed to serve development needs, or perhaps login nodes are used for compilation and direct cluster user logins.

 

There are no hard and fast rules, a storage node might be configured in a small cluster due to data sizing or transfer requirements, and an administrative node might not be needed in a medium cluster. Perhaps the storage node is connected to the high speed Interconnect for data sharing across that network, and perhaps the administrative node is also configured as a fail-over master. There are myriad options, and your Aspen Sales Engineer can help you decide what options best fit your needs.

 


Large Cluster

 

The “large” cluster has more than 256 nodes, sometimes thousands. At this size, scaling is critical, so the cluster could contain multiple storage nodes, dedicated fail-over masters, and perhaps more than one administration node.

 

 

The challenges of large clusters are many, but Aspen engineers can help you design and deploy your large cluster successfully. The scope of the problem sets and level of detail necessary to successfully deploy a large cluster require much closer design interaction than smaller clusters, as improper design decisions that might only cause annoyance in a smaller cluster can cause performance, operations, or execution issues in a large cluster. Your Aspen sales engineer will schedule several requirements meetings with you as well as have our senior cluster designers work directly with you and your organization to define your large cluster solution.

 


Storage and Database Clusters

 

A storage cluster is used to service larger data space or faster access requirements, and can be configured multiple ways. Sometimes multiple nodes are connected to a Storage Area Network, which is implemented by Fiber Channel or InfiniBand switches to connect multiple RAID systems to multiple hosts. A parallel file system might be deployed, which allows multiple hosts to access the same data space simultaneously. Aspen supports GFS, GPFS, Lustre, OCFS, and PVFS parallel file systems as well as other commercial offerings. If you have a specific parallel file system requirement that are not on this list, ask your Aspen Sales engineer. We probably have experience with that file system and have deployed it.

 

 

A single large RAID system might be deployed which has host ports for storage nodes to connect to. This option, while less scalable, removes the cost of a SAN switch, associated software, and support costs. This can be a significant savings. Storage clusters themselves can be integrated into the HPC cluster to serve data processing needs for the cluster and other organization computers. Figure 4 shows both the SAN storage cluster option as well as the smaller single RAID with host ports. Database clusters hardware configuration can resemble storage clusters in many ways.

 


Visualization Cluster

 

Visualization clusters often have one or more Graphics Processing Units (GPU) installed in each compute node. Normally a visualization cluster that is installed with more than one GPU in each node must be implemented using 3U or 4U compute nodes. The GPUs may be used for general computation on certain codes, or as graphics processors which perform rendering and push images to front end display nodes which are connected to the cluster. Aspen Systems has deployed visualization clusters in multiple environments, and can help you with your visualization cluster needs as well. If you have visualization questions, use your Aspen Systems sales engineers expertise to design and optimize your visualization solution.

 


<< Previous | Next >>


Bookmark and Share