Cooling
What are your cooling options?

Modern clusters are more powerful than ever before. The amount of processing power, memory, and the interconnect options available in a single node today far exceed anything available even 2 or 3 years ago. However, that additional processing power comes with a price, more power usage. In electronics, power is transformed into heat. More power usage means more heat to dissipate. As we pack more and more performance into smaller packages, which we then rack as tightly as possible to take advantage of all available space, cooling becomes critical. Inadequate cooling is the primary cause of cluster hardware failures. Aspen Systems partners with APC™ and Liebert™ to solve your cooling needs.
The optimum ambient temperature for your cluster is 68° to 77° F (Fahrenheit) (20° to 25° Celsius). Maximum ambient temperature should not exceed 80.6° F (27° Celsius) for any length of time. While your cluster can be operated in environments with above maximum ambient temperatures if necessary, doing so will adversely affect your code(s) performance and your clusters long term hardware reliability. UPS batteries are especially sensitive to higher temperature environments. A UPS battery deployed in a 90° F ambient temperature environment might only last 1 to 2 years, versus the 3 to 5 year normal battery lifetime. System memory and disk drives also will fail significantly sooner than normal in higher temperature environments.
Many existing raised floor computing facilities were designed for an estimated heat load of 3 to 4 kilowatts (kw) per rack. Using our standard 1U server loaded power usage (322 watts), a rack of 40 of these systems would produce a 12.88 kw heat load, over 3 times the maximum amount of heat the facility was designed to handle in that one rack. Using our twin systems, the heat load would be more than 6 times the amount of heat the facility can reliably dissipate from a single rack. If your systems are heavily utilized and as densely racked as possible, goals most HPC users aspire to, a rack could produce over 27 kw of waste heat. High density racks often overwhelm the cooling capacity in a single area of a traditional computer room, causing hot spots. Aspen can upgrade or supplement your facilities current cooling solutions so that this does not occur.
In non-standard compute facilities such as converted closets, or rooms that were originally designed to house personnel, the situation can be even worse, and can lead to errant code behavior and equipment failure. We call this the “small room” cooling problem, and Aspen has specific recommendations and solutions for this situation.
Small Room Cooling
Let's pretend that we're going to install the example cluster outlined in the physical layout section into a small 10 by 10 foot room that we've re-purposed for our cluster. This is a 46 node InfiniBand cluster, with 2 UPS systems, a 4U storage node and master, external RAID system, and KVM capabilities. Under full load, this cluster would generate approximately 16.363 kw of waste heat. We have a building air conditioning system, and the room is located in the center of your building, so no solar gain or additional heat is being produced. We would like to have the cluster maintained at an average temperature of no more than 77° F.
Conductive cooling, where heat flows through the walls of the space, can remove approximately 400 watts in a room this size. Passive ventilation, where the heat generated by the cluster flows into cooler air via a door, wall, or ceiling vent without a helper fan, could accommodate approximately 800 watts. Adding a fan to this vent could make this approach accommodate about 2 kw of waste heat. So combined passive (400w) and fan-assisted (2 kw) cooling could provide only 1/8th of the heat removal we need. In addition, the following factors must be taken into account.
- Room size – temperature increases as the room gets smaller
- walls,ceiling, floor – temperature increases as thermal resistance increases
- AC setback – if your building turns down, or off, the building air conditioning on nights and weekends, your room temperature will increase proportionally
- exposure – if a wall or walls is subject to sun exposure or heat transfer, temperatures will increase.
It is pretty obvious that we will need a dedicated air conditioner of some type for this cluster. How many tons of cooling would be necessary to remove the heat? Multiply your wattage by 3.413 to convert to British Thermal Units per hour (BTU/H).
16363(w) x 3.413 = 55847 (BTU/H)
55847 (BTU/H) / 12000 = 4.65 tons of cooling
Modifying your building air conditioning system to provide this 5 tons might be expensive or problematic. A supplementary 5 ton air conditioning unit, perhaps attached to your buildings chilled water supply, would adequately cool this cluster in this room, although altitude may play a part, as most air conditioner units are de-rated to lower tonnage numbers at higher altitudes.
If there is a return air plenum available to building air with sufficient capacity, Aspen can provide you with in-row, portable, rack, or ceiling mounted air conditioning or fan assist units. One popular option is a fan assisted rear door which attaches to every rack, allowing the exhausted heat to be plumbed into the building return air system. The APC™ Rack Air Removal Unit can remove up to 16.5 kw of exhaust heat per rack, and can be vented into building return air space.
If there is access to building chilled water, condenser water, or a glycol loop with enough capacity, Aspen can used chilled water, condenser water, or glycol units mounted in-row (in between racks), as rack doors, or mounted on the floor, wall, or ceiling to cool your cluster. These options circulate air within the room and the rack(s) through condenser units which are plumbed to the building supply, and the return fluid carries away the heat to a roof or outside mounted cooling tower or cooling system.
If neither of these options are available, Aspen can design an entire system for you, plumbing racks or in-room cooling units to an outside wall or roof where condensers or fan units are installed.
For a single rack or a few racks, Aspen can integrate your cluster with totally enclosed self-contained rack enclosures such as the Liebert MCR™ or XDF™ systems. These rack enclosures have integrated cooling and optional UPS systems, and in some cases do not need access to building water or utilize remote heat exchange units, so they can provide a totally stand-alone cooling environment for your cluster, and requiring only electrical power for the air conditioner. Totally self contained units are normally limited to 3 tons (10548 kw) per rack, so our example cluster could be installed in a small room with no external cooling integration by using two of these racking systems.
In most cases, Aspen recommends directed cooling units be installed for your cluster if possible, not room cooling units, to alleviate any possible hot spots inside the room. Directed cooling is installed with your cluster, as rear rack doors or in-row cooling units, or as the rack itself, and allow the hot air to be chilled very close to the heat source, your cluster. This increases effectiveness and results in less overall cost.
Directed cooling solutions for this small room environment can also be applicable to much larger computer facilities.
Compute Facility Cooling
Larger compute facilities, such as existing raised floor computer rooms, often have cooling issues as well. They often were designed for much lower density, power, and cooling requirements than today's high performance clusters have, so supplemental cooling may be needed.
Fan assist doors, in-row cooling units, rear air conditioner doors (plumbed to existing glycol, chilled water, or condenser water sources), totally enclosed racks, and other cooling assist technologies can help ensure that your cluster runs cool and efficiently. Totally self-enclosed cooling racks, such as the Liebert MCR™, can allow you to place a high density Aspen cluster into an over-stressed compute facility without additional demands on an already over-stressed facility.
For even larger installations, Aspen recommends that you consider “cold aisle / hot aisle” solutions, such as the APC InfrStruXure™ or Leibert XD™ Systems. These solutions concentrate the exhaust heat into “hot aisles”, increasing cooling efficiency, and can be retrofitted into almost any computer room. APC InfrStruXure™ Hot Aisle Containment Systems are integrated racks which are installed back to back, and have row end doors and a roof over the hot aisle between the rack rows. In-row cooling units are used to cool the exhaust air from the hot aisle and circulate it back to the front of the rack for re-use by the equipment. Cabling can be accommodated within the racks, on dedicated cable management trays installed on top of the racks, or ran under the floor in a raised floor facility.
No matter what your cooling needs, Aspen sales engineers can help you design the most cost effective solution for your problem. Talk to them about your cooling needs, they can help you.








