HECIOS: The High End Computing I/O Simulator


Motivation

As high end computing systems (HECs) grow to several tens of thousands of nodes, file I/O is becoming a critical performance issue. Current parallel file systems such as PVFS2 and others, can reasonably stripe data across a hundred nodes and achieve good performance for bulk transfers involving large aligned accesses. Serious performance limits exist, however, for small unaligned accesses, metadata operations, and accesses impacted by the consistency semantics (any time one process writes data that is read by another).

Proposal Summary
Accepted Proposal

Simulator Details

HECIOS is a simulation package being developed in the Clemson PARL lab in order to improve parallel file I/O performance. Presently, large cluster computers with proportionally large I/O storage subsystems are extremely rare. In order to experiment with high end I/O configurations we are developing HECIOS, the high end computing I/O simulator. Our goal is to provide a freely available cluster storage system simulator capable of providing extremely detailed simulations of I/O performance for parallel and scientific applications.

HECIOS is implemented using the OmNet++ simulation library, and leverages existing networking and disk components to provide an extremely detailed simulator for MPI-IO applications using a cluster storage system such as a parallel file system. OmNet++ provides extensive simulation capabilities for developing generic simulation models, scheduling events, state machine development, parallel simulation, and capturing and visualizing relevant simulation data.

In order to provide a detailed simulation environment for cluster internetworking, we are relying on the INET package provided for use with OmNet++. INET provides flexible network simulation components that simulate link level detail for TCP/IP traffic over switched ethernet. Our intention is to eventually add networking components capable of simulating high speed cluster interconnects such as Myrinet and Infiniband.

Members

HECIOS is developed at Clemson University's parallel architecture and research lab, or PARL.
Current team members include:
Prof. Walter Ligon
Brad Settlemyer
Michael Bassilly
Pooja Verma

Support

Hecios is being developed under the NSF CISE/CCF Division's HECURA program award #CCF-0621441
Project Title: HECURA: Improving Scalability in Parallel File Systems for High End Computing

Resources

PVFS2 Website
OmNet++ Community Site
OmNet++ Manual
INET Documentation
UMd I/O Trace files