Web based Management
ABC functions as an SSL encrypted web portal to access your cluster for day to day user and
administrative functions. You can access your cluster from your organizational
network, or even the Internet, and use ABC to monitor your cluster, submit jobs,
query statuses, and perform many of your daily cluster tasks, all from any
java equipped web browser.
Multi-level User Security
ABC utilizes PAM modules to enable systems level authentication to your clusters underlying user authorization
schema, including LDAP, NIS, or Unix password and shadow entries. If a user has an account on your
cluster, then they have an ABC account as well.
User and Group Admin
Within ABC, users can be grouped into any arbitrary group you wish, and specific permissions
can be assigned to that group to fine tune what your users can see or control. ABC defaults
to having two groups, Administrators, and All Users, but you can define
and utilize as many security groups as you wish. Every user is assigned to the All Users group, and root is automatically part of the Administrators group.
Remote Console Access
ABC requires IPMI to provide console access directly to any node in your cluster. If IPMI KVM over LAN
options are installed on your nodes, ABC allows the video display from the node to
be accessed from your web browser through ABC's web proxy service. Serial over LAN (SOL)
access is supported and proxied as well. If you simply need terminal access
to a running node, use the embedded ABC terminal from your web browser to
access any node on the cluster your permissions allow.
KVM over LAN Console, BIOS access
ABC allows you to directly access any cluster nodes console to troubleshoot or
repair the node, set BIOS, change a boot order, or perform other system level
functions. Perhaps you wish to boot the node with an iso or floppy image. With
virtual media, you can attach that image to a virtual device on the node and
boot the system using that image.
You no longer have to
be in the computer room to troubleshoot a problem node, you can use ABC from the comfort
of your office, your home, or even a hotel room half way around the world.
(network security policies and routing must allow access to the ABC server web
interface from your location.)
Remote Power Control
ABC uses IPMI to remotely power off, power on, or power cycle any of your nodes or any
group of nodes
from the ABC interface. Do you have power outages? Set your ABC control node to power on
automatically so that you can access ABC to perform a controlled power up of your cluster without being there physically.
Switched PDU Control
For nodes and peripherals that are not equipped with IPMI, ABC can access
and control switched
PDUs that allow the equipment to be remotely powered on or off.
By seamlessly integrating both IPMI and switched PDUs,
ABC can give you remote power control of every system in your cluster.
This capability has long been a holy grail of HPC cluster administration.
(Your cluster must be installed with switched PDUs connected to every
peripheral device.)
Administrative Automation
ABC provides additional tools to automate common administrative tasks across
your cluster. Using the ABC Package Management function, administrators can
inventory the installed packages in all nodes, compare package versions within
the cluster, and even synchronize the installed packages across the entire
environment.
Node Detail
ABC also provides a Plugin capability, which allows commands
to be ran in parallel across some or all of the cluster nodes. Plugins can
be simple shell scripts or complex programs that are added by Aspen or your
administrators to automate common tasks you might need to perform.
Of course, ABC is also a monitoring tool, automatically polling all your
nodes for both hardware and software health and providing displays of this
data in different ways. Both real time and historical data collection and
display is supported, and each value threshold is configurable as well as
the alarm action tied to that value. This allows every ABC installation
to be tuned to the specific environment that your cluster is in to
satisfy your specific needs.
Bare Metal Restore
If any of your cluster nodes use disks and an installed O.S., ABC
provides the capability to easily and quickly restore that node from an image.
ABC can snapshot one node, then use that image to restore another, changing
node specific configurations so that the newly imaged system returns to service
with the correct identity and configuration.
A disk failure on a compute node no longer means
hours of work re-building the system and re-configuring it to function in
your cluster again. Replace the failed disk, select the image you want restored
to the node and the node you want restored, and let ABC do it for you.
Recovery from a catastrophic hardware failure is easy too.
You can utilize images taken of your master, storage, compute, or any
other type of node to restore to another system should a failure occur.
Need to perform a major upgrade, or radically change your node configuration?
Upgrade and customize a single node so that it is set up exactly the way you want, then copy the node and deploy that image to all the other
nodes you wish upgraded.
|
Node Diagnostics
(Not yet Released!)
Our engineers don't want us talking about this yet, but its such a cool
feature that We just have to! One of the most time consuming tasks a
cluster administrator faces is repairing a node. Aspen utilizes our AIME diagnostics and quality assurance package
to ensure that our delivered systems are of the highest quality, and
AIME is now coming to your cluster, integrated with ABC.
AIME Node Diagnostics
AIME runs multiple automated tests on a nodes hardware,
including disks, memory, CPUs, and networking components. AIME will soon
be integrated into ABC to perform this same function for installed nodes
in the field. If you have a node failure, or a node that just appears to be
a little slow or cranky, you'll be able to put the node into AIME testing to
determine what the problem is directly from your ABC web interface.
(and the command line too. your friendly engineering team)
AIME will show you the results of these tests, and if there is something
wrong and your security allows it, open a support ticket directly with
Aspen support, detailing all the tests ran and their results so that we can
quickly replace hardware components or take corrective action. See? We said
that this was a cool feature!
User Portal
ABC isn't just for your administrators, it simplifies your end users lives
as well. If you use either Sun Grid Engine or Torque/Maui ,
your cluster users can submit
and monitor their jobs directly through the ABC interface.
Scheduler Monitor
ABC can be used to proxy VNC servers for your individual users X server needs, as well as provide access to Ganglia pages on your cluster. Many users find Ganglia familiar, and helpful
when studying the effects of particular code runs on their cluster.
Submit User Job
And of course your users have access to all the monitoring and real time
status tools ABC provides, allowing them to quickly see how much of your
cluster is loaded, and if enough nodes are free for a particular job. If
command line access is needed or desired, users can be allowed to open
an ABC SSH terminal to the master or other nodes to perform code compilation
or other tasks directly out of their web browser.
System Alerts
ABC monitors your clusters health, and provides system alerts when things
go wrong. For instance, your system and CPU temperature in every node
is monitored, and an event is generated when maximum temperature is
exceeded. The maximum temperature value is totally configurable within
ABC, although ABC comes with reasonable defaults for each of these settings.
Temperature Event
You can define custom actions for each event as well. In some facilities
with frequent cooling outages, Aspen customers have ABC power off nodes
that exceed a maximum temperature and notify them of the problem. This can
save the nodes under ABC control from continuing to operate under high
heat conditions, and remove heat load to help save other
less well-monitored nodes within the facility.
ABC can be configured to perform periodic S.M.A.R.T. checks on all hard drives in your cluster and report back the results, removing
the need for manual checks or cron driven scripts to perform the same
function. ABC also monitors free space on your partitions, and warns you if
the percentage of disk usage you set is exceeded.
ABC monitors many other things about your cluster, including system
fan status, motherboard voltages, and other critical system values by
default. It also integrates with UPS systems that service your cluster.
System warnings are also generated in the event of power events, and
you can define who you want notified when critical system
events occur.
Web Proxy
HPC clusters normally operate with all the compute nodes and peripherals connected to an internal network. ABC allows the web interfaces of internal peripherals, such as node
IPMI interfaces, RAID devices, UPS systems, PDUs, tape backup units, environmental monitoring systems,
cooling system interfaces, and other devices to be integrated directly into your
ABC web interface. Additional web services on your master or any other internal
node are easy to proxy as well.
ABC Web Proxy for Ganglia
ABC does this by securely web proxying the internal
network web output of the device through ABC and to your browser. This
allows you to seamlessly integrate additional peripherals as needed, and
access and authenticate them through a single ABC interface.
This capability makes ABC customizable and extremely extensible. Not only
can additional device web interfaces be added, but also additional custom
web pages that can be used to perform site specific tasks. One
Aspen customer utilizes this capability to operate a web control panel
accessed only by a specific user group.
They in turn use this interface to control and boot specific cluster
nodes into different environments. The possibilities are endless, and allow
you or Aspen to customize your ABC environment to your specific needs.
CLI
What if you're an experienced administrator or a command line
zealot? Is ABC for you?
Many experienced HPC engineers and administrators prefer using command line
utilities to graphical user interfaces for speed and scriptability, and
Aspen engineers are no exception. ABC comes with many command line
tools specifically targeted to the more experienced administrator. ABC CLI
tools exist to remotely control power on all nodes, retrieve sensor data,
and even to automatically restore a node!
CLI Tools
We may not have all the functionality of our web interface
replicated to command line utilities, but we're working on it. Tell us what
function you want to see. We may already have it, and if not,
we have plenty of engineers who love to add more cool features and command
line utilities.
Most of our command line utilities are written in PERL, bash, or python so
that you can easily see what they're doing, and modify or copy the code
to fit your own preferences or customizations. We love to get your feedback. If you want a command line tool
that we haven't written yet, it's likely that another customer would like that tool as well, so we'll be more than
happy to put your tool on our feature request list.
|
ABC combines your cluster management tools into a single unified interface, providing a
smoother operating environment and transforming an otherwise complex mixture of discrete
machines and devices into a converged, homogeneous environment. ABC allows less
experienced operators
to easily operate and maintain a Linux-based cluster.
Track your system and node performance using ABC to keep your cluster running consistently at
peak performance and recover quickly in the event of a failure. ABC’s comprehensive
system monitoring provides advance warning of node failures and other problems as well.
The ABC toolkit provides operators the ability to easily manage the entire cluster.
Aspen Systems is a market leader in customized computational
compute clusters built to fit your business needs. Our extensive experience in cluster
installation, maintenance, and management, combined with our years of experience with other
cluster management packages available, led us to create ABC.
Contact Aspen Systems sales
at 1-800-992-9242 for more information about ABC today!
|