NFS
<< Local File Systems | NFS | Parallel File Systems >>
Many HPC clusters, especially small and medium clusters with data access
requirements that can be served by a single host, utilize the Network File
System (NFS) to share data directories to the other nodes in the cluster.
NFS is the de facto standard for data sharing in HPC clusters because of
its ease of configuration and ubiquitous support. An NFS server runs on the
node that serves the data space, and an NFS client runs on all other nodes
that need to access the data.
This allows the users home directory, or shared data directories as needed, to be mounted to all compute nodes so that applications running on the compute nodes have access to the same data that is available to the user when they're logged in to the master node.
The NFS directories can be shared over the clusters administrative network, which is often Gigabit Ethernet, or over the clusters high speed interconnect, which can significantly increase data access speed.
There are projects such as
parallel NFS
which aim to make NFS servers function in parallel so that more than
one NFS server may be used to service the data share, but that capability
isn't yet mainstream. Parallel NFS has now been added to the NFS v4.1
standard, and several vendors are developing and intend to support it.
The current NFS implementation relies on a single server to serve any particular data space, which means that the protocol overhead, local RAID speed, bus architecture, network interfaces, memory, and Central Processing Unit (CPU) speed of that server limits the speed as which NFS data can be accessed. If your application(s) data requirements for that data space are higher than this single server can accommodate, the NFS server becomes a performance bottle neck for the cluster.
NFS has been around a long time, and it has different protocol versions. There are version 2, version 3, and version 4 NFS implementations. NFS Version 2 operates over UDP protocol only, and limits read and writes to 8 KBytes per transaction, which can be a major performance bottle neck.
Modern Linux distributions support NFS version 3, which allows the use of TCP to up the transaction size to 32 KBytes or even larger, although the upper limit is architecture and implementation dependent. NFS Version 3 has other performance modifications as well, such as weak cache consistency and safe asynchronous writes (which allow the server to reply before it has saved the data to disk in order to concatenate many small write operations into larger disk transactions).
Aspen recommends NFS Version 3 for almost all HPC uses unless this is impossible due to client or software version considerations. NFS Version 4 introduces state control, byte locking, share reservations, file delegation, compound RPC calls, upgraded ACLs and security enhancements, and other features which promote wide area use and better file access control. Contrary to normal logic, NFS V4 performance is not generally better than NFS Version 3, as the new capabilities come with cost paid in greater overhead and processing.
Parallel file systems can be used to solve many of the issues raised above, so why not just jump straight into implementing a parallel file system as part of your next HPC solution? There are some good reasons to utilize NFS vs. a parallel file system.
- NFS is well supported.
NFS has been around since 1984, when Sun Microsystems developed the protocol. It is an open standard, which allows anyone to implement a client, and there are clients for every major operating system, most of which have been well tested and tuned, and are well understood. That counts for a lot, as your file system stability directly affects both the stability and productivity of your solution.
- NFS is familiar.
It's pretty easy to configure and mount an NFS share. It's also easy to tune a client for good performance. NFS server tuning is not quite as easy or well understood, but it's not that hard. Almost any Unix or Linux administrator has at least a passing familiarity with NFS, and knows how to configure and administer it. The same cannot be said of many parallel file system servers and clients, as they have a much smaller installed base and can be quite complex to configure and tune.
- NFS performance might be enough for your needs.
Yes, NFS has inherent limitations built into the protocol that limit its scalibility. One of the ones we've mentioned is that you must utilize a single data server for a data share, so that servers maximum performance defines your entire data shares maximum performance, but there are ways to increase this performance, sometimes to the performance level of a parallel file system, while retaining the familiarity and ease of use of NFS. We'll discuss those options in the High Performance NFS section.
How good or bad your NFS performance is depends directly on your hardware and software implementation. Lets pretend that you have an NFS server with an internal RAID system, and that your local file system performance is approximately 450 MB/s write, and 600 MB/s read with random transaction sizes. If you utilizes Gigabit Ethernet to your nodes to mount that NFS server data shares, that single Gigabit Ethernet becomes the performance bottle neck, and will greatly limit the speed your NFS clients can read and write. A single Gigabit Ethernet will sustain approximately 50 to 90 MB/s, with bursts of a bit over 100 MB/s.
Equipping your cluster with a high speed Interconnect technology such as InfiniBand, then utilizing that fabric to mount the NFS shares will remove that bottleneck. In that case, with tuning, you could approach the local file system access bandwidth, although you will never match it due to the protocol overhead involved.
For many codes ran on small and medium clusters, this performance envelope is quite sufficient to allow efficient code execution in a cost-effective manner. More data access performance will inevitably cost you more money, so you should pay only for the performance your code really needs to properly perform.
High Performance NFS Solutions
So, we established a few ground rules about the maximum performance a single NFS server can provide. You can always upgrade your single NFS server performance within certain limits by moving to bigger NFS servers, perhaps with an upgraded RAID system, bigger network pipes to the clients, and more computational power, but there will always be limits to the standard NFS server based on the fact that its NFS server is implemented in software. However, there are other options.
It is possible to increase NFS performance exponentially via
the use of NFS server appliance solutions. Aspen teams with
Isilon
and
BlueArc
to offer these high performance NFS solutions to our customers.
These commercial solutions scale NFS performance higher than is possible in a standard NFS server, and can give you the performance characteristics of a parallel file system with the ease of administration of NFS. That can be compelling for smaller cluster installations and the smaller number of administration staff they normally have to manage their resources.
Isilon
is a clustered storage solution, and implements several great enterprise
features.
One of the most interesting features of this technology, called
Smart Connect
is dynamic NFS
load balancing and transparent fail-over and fail-back, features that are
not available or extremely difficult to implement in more traditional
NFS server implementations.
You can configure load balancing based on your needs, then have the system automatically re-balance periodically. You can even define different bandwidth and service profiles for different node types in your cluster. The fail-over and load balancing works via virtual IP addresses that all cluster clients use to mount the exported NFS shares. These capabilities require no additional client software, so the standard NFS clients installed in your nodes can be used.
Aggregate performance
can be very high, as Isilon allows you to simply add more nodes to the
storage cluster to increase either storage or performance. Name space is
global, and the
OneFS
operating system combines the
file system, volume manager, and RAID into one software layer. Total aggregate
performance with this system can exceed 45 GB/s.
BlueArc
is another Aspen partner, and provides a hardware NFS server appliance
solution using
FPGA
technology. Performance per watt is extremely high because of this, with
reduced access time and enhanced throughput . BlueArc offers both large
scale systems, and smaller appliances such as the
Titan 3000
or
Mercury
series, which can offer up to
20 GB/s and 4 PetaBytes of storage and 200,000
IOPS
. Mercury can maximize your total NFS throughput in increments of either 700 MB/s or 1100MB/s, with a possible
maximum throughput of approximately 10 GB/s.
The Titan and the Mercury series utilize a BlueArc proprietary file system called
SiliconFS
to achieve those performance levels.
Both of these solutions provide significant performance increases over standard NFS servers, as well as integrating RAID management, providing snapshot capabilities, data migration, and other enterprise data management features, and combine ease of management with high performance. They may be the solution you need for your HPC data needs.
There can also be advantages to implementing a parallel file system , and we'll look at parallel file system options next.
<< Local File Systems | NFS | Parallel File Systems >>




