Adaptable
Computing Cluster
General Description
Engineering computer systems almost always involves some
cost-performance analysis.[1]
For high-performance computing systems, a cluster of workstations
with a high-performance network has emerged as a cost-effective
solution for many users. One class of these cluster of workstation,
which is distinguished by its exclusive use of COTS (commodity,
off-the-shelf) parts, is called a Beowulf machine.
Our Adaptable Computing Cluster (ACC) project is exploring the
novel use of a off-the-shelf component called an ACE2card. This
PCI-bus card has reconfigurable computing (RC) resources and, in
our system, a gigabit Ethernet network interface on the mezzanine
adapter. Since the RC is on the critical path to the network switch,
the ACE2card and RC is an integral part of the machine. Although
the ACE2card is not a commodity part, we are using it to explore
its potential in a Beowulf-class machine. Specifically, we are
investigating questions such as how to balance the use of RC for
computation and communication and how to best use the system for
specific application/problem domains.
The current system consists of eight processors (four dual-processor
workstations) connected by a Foundry Net BigIron (gigabit switch).
An NSF award as enabled us to expand the system to 16 nodes and the
ACE2card are slated to be delivered in mid-September 2000. Underway
is graduate student work developing Linux device drivers for the ACE2card
and the network interface mezzanine card. (Preliminary results are
achieving 96 MB/s, roughly 75% of the theoretical bandwidth, through
the line to the switch.)
Several investigations have been organized as part of the ACC project.
First is an investigation into the general evaluation of the performance
of the machine. This includes the development of network protocols
that are implemented in the RC, user-level network interfaces, and
development of RC components to support general parallel processing
applications, via MPI or another message-passing system. Closely related
is an investigation that proposes to port PVFS (a parallel file system
developed in the PARL lab) to the ACC machine. PVFS on ACC would implement
the PVFS control messages in the RC and thus greatly improve the latency.
A third investigation is in collaboration with Clemson University
Genomics Institute (CUGI). It is well established that the performance
of genomics applications can be greatly improved by using RC to speed
up individual jobs and the nature of the applications allow them to
be run in parallel on Beowulf machines. So we expect that the ACC machine
will show exceptional performance gains, but there are several important
questions. For example, how to balance the system resources
between communication and computation to give optimal performance.
[1] the exceptions are the
relatively few `performance at any cost' systems