Alternatives to Coscheduling a Network of Workstations

Shailabh Nagar, Ajit Banerjee, Anand Sivasubramaniam, Chita R. Das

Research output: Contribution to journalArticle

23 Scopus citations

Abstract

Efficient scheduling of processes on processors of a Network of Workstations (NOW) is essential for good system performance. However, the design of such schedulers is challenging because of the complex interaction between several system and workload parameters. Coscheduling, though desirable, is impractical for such a loosely coupled environment. Two operations, waiting for a message and arrival of a message, can be used to take remedial actions that can guide the behavior of the system toward coscheduling using local information. We present a taxonomy of three possibilities for each of these two operations, leading to a design space of 3×3 scheduling mechanisms. This paper presents an extensive implementation and evaluation exercise in studying these mechanisms. Adhering to the philosophy that scheduling and communication are intertwined and should be studied in conjunction, a complete communication substrate for UltraSPARC workstations, connected by Myrinet and running Solaris 2.5.1, has been developed. This platform provides the entire Message Passing Interface (MPI) to readily run off-the-shelf MPI applications by employing protected low-latency user-level messaging. Several applications can concurrently use this interface. This platform has been used to design, implement, and uniformly evaluate nine scheduling strategies with a mixture of concurrent real applications with varying communication intensities. This includes five new schemes (Periodic Boost, Periodic Boost with Spin Block, Spin Yield, Periodic Boost with Spin Yield, Dynamic Coscheduling with Spin Yield) that are presented in this paper. In addition to our evaluations of the pros and cons of each mechanism in terms of throughput, response time, CPU utilization, and fairness, it is shown that Periodic Boost is a promising approach for scheduling processes on a NOW.

Original languageEnglish (US)
Pages (from-to)302-327
Number of pages26
JournalJournal of Parallel and Distributed Computing
Volume59
Issue number2
DOIs
StatePublished - Nov 1 1999

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this