BeoSim
A Multi-cluster Computational Grid Simulator for Parallel Job Scheduling Research
Initially developed and maintained in the Parallel Architecture Research Laboratory
Continuing research in the Electrical Engineering Department at the United States Naval Academy
Papers and Presentations      
  • CCGrid 2008: Resilience 2008
    • "Application Resilience: Making Progress in Spite of Failure",
      (Paper: PDF, Local: PDF) (Presentation: PPT)
  • CSC 2008: Invited Talk
    • "Nuclear Stockpile Stewardship, Trials and Tribulations: A Computing Perspective",
      (Presentation: PDF, PPT)
  • Middleware 2007: MGC 2007
    • "Using Checkpointing to Recover from Poor Multi-site Parallel Job Scheduling Decisions",
      (Paper: PDF) (Presentation: PDF)
  • PDCS 2007
    • "The Impact of Error in User-Provided Bandwidth Estimates on Multi-site Parallel Job Scheduling Performance",
      (Paper: PDF) (Presentation: PDF)
  • ICPADS 2006: SRMPDS
    • "Ensuring Fairness Among Participating Clusters During Multi-site Parallel Job Scheduling",
      (Paper: PDF) (Presentation: PDF)
  • ICPADS 2006
    • "The Impact of Information Availability and Workload Characteristics on the Performance of Job Co-allocation in Multi-clusters",
      (Paper: PDF) (Presentation: PDF)
  • Ph.D. Dissertation, Dec. 2005
    • "Improving Parallel Job Scheduling Performance in Multi-clusters Through Selective Job Co-allocation",
      (Manuscript: PDF, PS) (Presentation: PDF)
  • CAEFF: Site Visit 2005
    • "Parallel Job Scheduling in a Mini-Grid",
      (Poster: PPT, PDF)
  • Journal of Supercomputing: Vol. 34
    • "Characterization of Bandwidth-aware Meta-schedulers for Co-allocating Jobs Across Multiple Clusters",
      (Paper: PDF) (Local draft: PDF, PS)
  • SC 2004: NASA Booth
  • CLUSTER 2004
    • "Bandwidth-aware Co-allocating Meta-schedulers for Mini-grid Architectures",
      (Paper: PDF, PS) (Presentation: PDF)
  • SURE: 2004 Program
    • "Java Based Visualizer for BeoSim",
      (Presentation: PPT) (Poster: PPT)
  • IPDPS 2004: PMEO-PDS
    • "Job Communication Characterization and its Impact on Meta-scheduling Co-allocated Jobs in a Mini-grid",
      (Paper: PDF, PS) (Presentation: PDF)
  • CAEFF: Site Visit 2004
    • "Parallel Job Scheduling in a Mini-Grid",
      (Poster: PPT, PDF)
  • CAEFF: Site Visit 2003
    • "Meta-scheduling for Mini-grids",
      (Poster: PPT, PDF)
  • SC 2002: NASA Booth (GSFC)
    • "Beowulf/Mini-Grid System Software",
      (Poster: PDF) (Whitepaper: PDF)
  • PARL TR's: PARL-2002-009
    • "Computational Mini-Grid Research at Clemson University",
      (Report: PS, PDF)
    BeoViz: BeoSim's Front-end Visualizer

    Introduction      
    Computational multi-clusters are an important emerging class of supercomputing architectures. As multi-cluster systems become more prevalent, techniques for efficiently exploiting these resources become increasingly significant. A critical aspect of exploiting these resources is the challenge of scheduling. In order to maximize job throughput, multi-cluster schedulers must simultaneously leverage the collective computational resources of each of its participating clusters. By doing so, jobs that would otherwise wait for nodes to become available on a single cluster can potentially run earlier by aggregating disjoint resources throughout the multi-cluster. This procedure can result in dramatic reductions in queue waiting times.

    The main caveat of this approach is that by mapping jobs across cluster boundaries, inter-cluster network resources are also consumed. If the inter-cluster network links become too saturated with traffic, any co-allocated jobs may experience degraded runtime performance due to the communication bottleneck present in the network infrastructure. This degradation in runtime performance can potentially offset the benefit of performing job co-allocation in the first place. More precisely, the increase in job runtime due to link saturation can rapidly outweigh the decrease in queue waiting time, thus resulting in poorer overall system performance.

    Multi-cluster schedulers must make use of all available information pertaining to job communication structure as well as network topology and utilization in order to improve job throughput while mitigating any negative impact to job runtime performance due to network congestion. Additionally, these schedulers must make reasonable co-allocation decisions in the absence of specific job and network information, as this information is not always available.

    In this research, we have developed a bandwidth-centric job communication model that captures the interaction and impact of simultaneously co-allocating jobs across multiple clusters. We compare our dynamic model with previous research that utilizes a fixed execution time penalty for co-allocated jobs. We explore the interaction of simultaneously co-allocated jobs and the contention they often create in the network infrastructure of a dedicated computational multi-cluster.

    We have also developed several bandwidth-aware co-allocating meta-schedulers. These schedulers take inter-cluster network utilization into account as a means by which to mitigate degraded job run-time performance. By making use of a bandwidth-centric parallel job communication model we are able to evaluate the performance of multi-cluster scheduling algorithms that focus not only on node resource allocation, but also on shared inter-cluster network bandwidth.

    BeoSim is a discrete-event simulator that has been implemented for the purpose of studying multi-site parallel job scheduling algorithms in the context of a multicluster computational grid.

    Web Stats      
  • Detailed Stats

  • eXTReMe Tracker
    TBA      
    Current Team      
  • William M. Jones, Ph.D.
    • Project Leader, Currently at US Naval Academy
    Former Members      
  • Louis W. Pang
    • Lead Programmer, Currently at OPNET
  • Michael F. Bassily
    • Visualizer Programmer, Currently at Clemson University
  • Nishant Shrivastava
    • Workload Generation, Currently at Cisco
  • Walter B. Ligon III, Ph.D.
    • PhD Advisor, Currently at Clemson University
  • Daniel Stanzione, Ph.D.
    • Colleague, Currently at ASU
    Related Research      
  • Dynamic Virtual Clustering
  • Parallel Job Scheduling
  • Parallel Job Scheduling Strategies Workshop
  • Parallel Workload Archive
  • Computer Security Conference 2008
  • Proposal Goals
  • Current Goals
  • Previous Goals
  • Proposal
  • Initial problem addressed
  • Visualizer specification
  • Possible Publication Venues      
  • ISPASS 2006 -- Due Oct. 7, 2005
  • ICPADS 2006 -- Due Jan. 15, 2006
  • ICPP 2006 -- Due Feb. 1, 2006
  • TCPP Announce -- Archives
  • Cluster and Grid Computing -- Journals and Confs
  • Cluster and Grid Computing - Local (Modified) Copy (Fall 2006)
  • Conferences
  • Conferences - Local Copy