Disks and Interfaces

<< Overview | Disks and Interfaces | RAID Levels >>


 

Hard Disk Drives

 

No matter how your storage is configured, or where it is installed in your HPC solution, it all eventually comes down to storing bits on media, and chances are most of your HPC data will be stored to and read from a hard disk drive (HDD).

 

HDDs store binary information (zeros and ones) by changing the magnetic direction of a location on a platter that represents that bit. One or more platters are in the HDD, and are accessed by drive heads on actuator arms, which read or modify the magnetic direction of that location. Generally, most HDDs implement error correcting codes such as Reed-Solomon or LDPC which are used to correct most errors that might occur from reading or writing the data to be stored. HDDs have been been around for over fifty years, but the technology changes and manufacturing sophistication of todays' high capacity and enterprise hard drives make any comparison with the drives of ten or twenty years ago almost laughable. Capacity and reliability have increased geometrically, but regrettably HDD access times have not decreased that much. Access time to data on an HDD is a slave to physics, and is tied to the rotational speed of the platters, which are currently 5400 rpm, 7200 RPM, 10,000 RPM, or 15,000 RPM.

 

The most common HPC HDD form factor (mid 2009) is called 3.5", an historical reference to a drive size which matches the form factor used by a half-height 3.5" floppy in the 1980's. The actual size is 4" x 1" x 5.75". A newer format, 2.5", is becoming more and more common. It is 2.75" x .6" x 3.9", with the height varying between .6"(15 mm) and .375"(9.5 mm). 2.5" drives are used in laptops and other portable devices, and are becoming more prevalent, but most enterprise class drives are still 3.5" format.

 

There are two different general classes of HDDs, Enterprise, and consumer grade. Enterprise hard drives generally are engineered differently, with better shock and vibration resistance and different components, and are designed for the higher duty cycle that can be experienced in HPC solutions. A consumer class HDD is expected to be used by a worker approximately 40 hours a week, or 20% of the total time the drive is installed. However, HPC applications are often ran by schedulers,which don't take days off or go home for supper. Some HPC applications can drive I/O duty cyles close to 100%, meaning that the drive is working almost all its life. Aspen recommends Enterprise class HDDs for all master or storage node disk configurations by default, and often for compute node disk deployment as well, depending on the compute node I/O requirements driven by your applications.

 

Solid State Drives

 

Solid state drive (SSD) technology has become available in recent years, and is quickly being adopted for many different purposes. An SSD uses the same disk interfaces we've outlined, but stores the data directly into solid-state memory, using SRAM , DRAM , Flash memory , or a combination of these to present a data space to the interface. SSDs can provide much lower access times than HDDs and good bandwidth. There are no moving parts in an SSD, so reliablity can be quite good.

 

Generally, lower priced SSDs utilize MLC flash memory, while more expensive and higher performing drives utilize SLC flash memory technology to increase speed and reliability.

 

Typically SSDs provide extremely fast read performance compared to HDDs, and provide much more performance consistency than HDDs. As there are no moving parts, they are also much less noisy, are easier to cool, and exhibit high mechanical reliability. Given their higher performance envelope, high performance solutions often utilize SSDs to cache metadata or frequently accessed data to increase the solution response speed. However, given their much higher cost and life cycle limitations, they are not often used to satisfy the entire storage space requirement.

 

Life cycle is very important when considering SSDs. Flash based drives have an average life cycle of one to ten thousand writes for MLC, or ~100,000 writes for SLC technology. Extremely high endurance cells can have an endurance of up to five million writes. While this sounds like quite a lot, a standard HDD deployed in an HPC solution can quite easily exceed this write cycle in its lifetime. Manufacturers are introducing wear leveling algorithms and other technology in an effort to extend SSD life cycles and make the technology more usable. Write speed on flash based devices can be quite low compared to the read speeds.

 

SSD technology is rapidly evolving, and very usable SSDs are available today. However, SSDs must be deployed carefully in order to maximize the cost benefits and have the proper effect on your overall solution data access speed.

 

Disk Interfaces

 

Both HDDs and SSDs are engineered with disk interfaces that standardize their access and allow them to be used in different systems who implement the same interface. There have been many different disk interfaces over the years, but lately the industry has began to standardize on SAS and SATA interfaces, which themselves are serial implementations of older parallel protocols and interfaces.

 

 

 

SAS – Serial Attached SCSI – is one of the newer interfaces for HDDs, replacing the older parallel SCSI protocol and using the same command set. Serial attached SCSI is a point to point serial bus, and allows speeds up to 6 Gbits/sec (750 MB/s) per channel. Using SAS expanders, SAS can connect up to 16384 devices. SAS expanders can connect SATA drives as well. SAS is widely used, and is commonly deployed where high performance is required. The SAS interface is very high performance, but generally SAS drives are smaller capacity than SATA drives. SAS drives also commonly implement higher rotational speeds (up to 15K RPM) than SATA drives.

 

Ultra 320 SCSI was the predecessor to SAS, and uses an older parallel bus design. Ultra 320 SCSI is capable of handling 15 drives per channel and has a max throughput of 320 MB/s. Now being replaced by SAS, SCSI was the high performance disk standard for years, and was deployed in most of the storage solutions available until SAS became widely adopted.

 

The SCSI command protocol is quite full featured, and allowed other devices than hard drives to be connected to the bus. The SCSI protocol forms the basis of SAS, ISCSI , and the Fibre Channel Protocol , so the command set will be around and used for years to come.

 

 

 

SATA – Serial ATA – is the newest interface for consumer level mass storage devices. Serial ATA is a point to point to point serial bus, and allows speeds up to 3 Gbits/sec (375 MB/s) per channel. Unlike SAS, the SATA bus cannot be expanded. However, SAS expanders can inter-operate with SATA drives, so SAS expanders are often used to access multiple SATA drives.

 

SATA drives are often used in HPC solutions because they cost less and have higher capacity than SAS drives, but they are not considered as reliable or as fast as SAS drives. SATA was developed as a replacement for PATA drives, so the design targeted lower cost and simplicity. However, the lines have blurred a bit due to higher quality SATA designs and manufacturing improvements in the platters themselves. Many high performance solutions now utilize SATA in addition to, or instead of SAS drives, as reliability and speed differences can often be masked by modern RAID technologies, making SATA both a cost effective and highly reliable solution. The SATA IO (International Organization) standard has now been released, which updates SATA interface speed to 6 Gb/s to match SAS.

 

 

 


<< Overview | Disks and Interfaces | RAID Levels >>


Bookmark and Share