Local File Systems

<< NAS and SAN Technologies | Local File Systems| NFS >>


Local file systems are used on each disked node in your cluster. Quite often, the master node or any storage, login, or special use nodes employ a RAID 1 software or hardware mirror to protect the operating system and ensure continuation of service in the event of a disk failure. There are valid reasons to prefer one local file system over another, and this decision is almost always predicated on the data you will store on that partition and its expected usage.

 

Your compute nodes may be disked, with the distribution installed on the compute nodes hard drive. In this case, a RAID is not cost effective. Instead, compute nodes are designed to be easily restored should a hard drive fail. Aspen normally configures disked compute node partitions as EXT3. Even diskless booted compute nodes may be configured as hybrid compute nodes in order to provide fast local scratch space. In these cases, the file system selection is matched to the projected data usage pattern.

 

For most O.S. partitions, such as / or /boot, Aspen recommends that your systems be configured with the EXT3 file system for reliability and security. On larger RAID file systems where large amounts of input or output data are stored and performance needs are somewhat higher, Aspen will normally configure the XFS or JFS file systems for you based on our understanding of your requirements, or your specific file system preference.

 

Common Local File Systems

 

  • EXT3

     

    EXT3 (or third extended filesystem) is the most common and widely used local file system for Linux distributions. Based on the original Linux EXT2 file system, and adding journaling for reliability, EXT3 is probably the most well tested of the file systems, and is highly reliable under most situations. Many standard utilities exist to administer EXT3 file systems, and while EXT3 is not the fastest file system, it is quite safe and robust.

     

    EXT3 is a journaling file system, as most of the file systems we will discuss are, which means that the file system maintains a journal of any changes it intends to make. If a crash should occur during a change to the file system, recovery is accomplished by replaying changes from the journal until the file system is again consistent.

     

    EXT3 is quite recoverable in the event of major data corruption. Its metadata is in fixed, known, locations, and some redundancy exists in those structures that can be helpful for recovery.

     

    EXT3 does not support extents, dynamic inode allocation, or block sub-allocations, but does support file system quotas. It has a limit of 31,998 sub-directories in each directory structure.

     

  • EXT4

     

    EXT4 (or fourth extended filesystem) is the successor to EXT3. As of kernel version 2.6.28, EXT4 is part of the standard kernel, so it is available on most of the more aggressively updated distributions such as Fedora or Ubuntu. EXT4 is not currently available on many enterprise distributions.

     

    EXT4 adds the capability to support volumes up to 1 exabyte and a maximum file size of 16 TB. EXT4 utilizes extents to achieve greater performance than EXT3, but still provides backwards compatible with EXT3 with some performance loss. EXT4 also provides for persistent pre-allocation, delayed allocation, and bumps the number of sub-directories allowed to 64,000. Journal check-summing, faster file system checking, and more accurate time stamping round out the additional file system features introduced in EXT4.

     

    EXT4 is about as fast as XFS for sequential read performance, and outperforms almost every other local file system for sequential writes. EXT4 is a good upgrade to EXT3, but is clearly an evolutionary step. EXT4 has been considered as an interim solution until Btrfs is released. EXT4 is not generally supported for your boot partition (mid 2009).

     

  • ReiserFS

     

    Reiserfs was designed and implemented at Namesys (out of business) by a team lead by Hans Reiser. ReiserFS was used by Novell SuSe as their default file system for quite some time, and was considered quite fast compared to other file systems available then. Regrettably, Reiserfs also garnered a reputation for data corruption.

     

    ReiserFS and EXT3 have been compared many times, and the issue has begun to attain the status of a religious war, with some people extremely happy with ReiserFS, and others just as blindly committed to EXT3. It is still normal to find rabid supporters and detractors on both sides of this somewhat historically acrimonious debate. We're geeks too, so we have our strong opinions as well. Aspen does not recommend using ReiserFS due to its historical data corruption issues and current lack of a wide developer base.

     

    Reiser4 is an ongoing project to develop the successor to ReiserFS, and its developers are beginning to consider the process needed to insert ReiserFS into the mainline Linux kernel tree (mid 2009).

     

  • XFS

     

    XFS was developed by SGI for Irix, and was later ported to the Linux kernel. XFS is quite good at large file transfers, and provides quite good performance for large RAID systems. XFS is less efficient at small file creation and deletion.

     

    XFS is a journaling file system, and has been in the mainline Linux kernel since the mid 1990s. XFS is well understood, and available on almost all distributions except RedHat Enterprise Linux. One of the better performance attributes of XFS is the ability to provide striped allocation, where a stripe unit can be specified during file system creation that aligns with the stripe size of the RAID array the file system resides on. This can greatly increase file system performance. XFS utilizes file system extents, can have variable block sizes, performs delayed allocation for file creation to increase write performance, and allows both direct and guaranteed rate I/O, and supports quotas.

     

    XFS is particularly suited to Raided file systems which service large files.

     

  • JFS

     

    JFS was developed by IBM, and is open sourced and supported in the Linux kernel via a kernel module. JFS is fast, and reliable, with good consistency in performance. JFS utilizes a journal and supports dynamic inode allocation, but also supports allocation groups, which allow you to divide aggregate disk space in order to assign allocation policies to provide better and/or more consistent I/O performance.

     

    Given these attributes, JFS is a quite acceptable choice for many different HPC applications. However, there is some feeling that IBM may not wish to support JFS long term, although they do still provide patches to the mainline Linux Kernel for the file system. JFS quota support was broken in several kernel versions, including some kernels used for enterprise distributions, but that fault has been corrected in 2.6.18 and newer kernels.

     

We've only covered a few of the most commonly deployed local file systems. In almost all cases, a cluster needs to share a common data space across all of its nodes in order to facilitate parallel computation or other types of calculations. This is where networked file systems such as NFS come into play, or even high performance parallel file systems that provide even greater scalibility. We will attempt to outline some of the advantages, disadvantages, and trade-offs you will encounter when choosing your networked or parallel file system for your HPC solution next.

 


<< NAS and SAN Technologies | Local File Systems| NFS >>


Bookmark and Share