Checkpointing with mutable checkpoints

Guohong Cao, Mukesh Singhal

Research output: Contribution to journalArticle

39 Citations (Scopus)

Abstract

There are two approaches to reduce the overhead associated with coordinated checkpointing: first is to minimize the number of synchronization messages and the number of checkpoints; the other is to make the checkpointing process non-blocking. In our previous work (IEEE Parallel Distributed Systems 9 (12) (1998) 1213), we proved that there does not exist a non-blocking algorithm which forces only a minimum number of processes to take their checkpoints. In this paper, we present a min-process algorithm which relaxes the non-blocking condition while tries to minimize the blocking time, and a non-blocking algorithm which relaxes the min-process condition while minimizing the number of checkpoints saved on the stable storage. The proposed non-blocking algorithm is based on the concept of "mutable checkpoint", which is neither a tentative checkpoint nor a permanent checkpoint. Based on mutable checkpoints, our non-blocking algorithm avoids the avalanche effect and forces only a minimum number of processes to take their checkpoints on the stable storage.

Original languageEnglish (US)
Pages (from-to)1127-1148
Number of pages22
JournalTheoretical Computer Science
Volume290
Issue number2
DOIs
StatePublished - Jan 2 2003

Fingerprint

Checkpointing
Checkpoint
Synchronization
Minimise
Avalanche
Parallel Systems
Distributed Systems

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Cao, Guohong ; Singhal, Mukesh. / Checkpointing with mutable checkpoints. In: Theoretical Computer Science. 2003 ; Vol. 290, No. 2. pp. 1127-1148.
@article{031d0c3d5d3b4c04b2c73e80cd50d540,
title = "Checkpointing with mutable checkpoints",
abstract = "There are two approaches to reduce the overhead associated with coordinated checkpointing: first is to minimize the number of synchronization messages and the number of checkpoints; the other is to make the checkpointing process non-blocking. In our previous work (IEEE Parallel Distributed Systems 9 (12) (1998) 1213), we proved that there does not exist a non-blocking algorithm which forces only a minimum number of processes to take their checkpoints. In this paper, we present a min-process algorithm which relaxes the non-blocking condition while tries to minimize the blocking time, and a non-blocking algorithm which relaxes the min-process condition while minimizing the number of checkpoints saved on the stable storage. The proposed non-blocking algorithm is based on the concept of {"}mutable checkpoint{"}, which is neither a tentative checkpoint nor a permanent checkpoint. Based on mutable checkpoints, our non-blocking algorithm avoids the avalanche effect and forces only a minimum number of processes to take their checkpoints on the stable storage.",
author = "Guohong Cao and Mukesh Singhal",
year = "2003",
month = "1",
day = "2",
doi = "10.1016/S0304-3975(02)00566-2",
language = "English (US)",
volume = "290",
pages = "1127--1148",
journal = "Theoretical Computer Science",
issn = "0304-3975",
publisher = "Elsevier",
number = "2",

}

Checkpointing with mutable checkpoints. / Cao, Guohong; Singhal, Mukesh.

In: Theoretical Computer Science, Vol. 290, No. 2, 02.01.2003, p. 1127-1148.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Checkpointing with mutable checkpoints

AU - Cao, Guohong

AU - Singhal, Mukesh

PY - 2003/1/2

Y1 - 2003/1/2

N2 - There are two approaches to reduce the overhead associated with coordinated checkpointing: first is to minimize the number of synchronization messages and the number of checkpoints; the other is to make the checkpointing process non-blocking. In our previous work (IEEE Parallel Distributed Systems 9 (12) (1998) 1213), we proved that there does not exist a non-blocking algorithm which forces only a minimum number of processes to take their checkpoints. In this paper, we present a min-process algorithm which relaxes the non-blocking condition while tries to minimize the blocking time, and a non-blocking algorithm which relaxes the min-process condition while minimizing the number of checkpoints saved on the stable storage. The proposed non-blocking algorithm is based on the concept of "mutable checkpoint", which is neither a tentative checkpoint nor a permanent checkpoint. Based on mutable checkpoints, our non-blocking algorithm avoids the avalanche effect and forces only a minimum number of processes to take their checkpoints on the stable storage.

AB - There are two approaches to reduce the overhead associated with coordinated checkpointing: first is to minimize the number of synchronization messages and the number of checkpoints; the other is to make the checkpointing process non-blocking. In our previous work (IEEE Parallel Distributed Systems 9 (12) (1998) 1213), we proved that there does not exist a non-blocking algorithm which forces only a minimum number of processes to take their checkpoints. In this paper, we present a min-process algorithm which relaxes the non-blocking condition while tries to minimize the blocking time, and a non-blocking algorithm which relaxes the min-process condition while minimizing the number of checkpoints saved on the stable storage. The proposed non-blocking algorithm is based on the concept of "mutable checkpoint", which is neither a tentative checkpoint nor a permanent checkpoint. Based on mutable checkpoints, our non-blocking algorithm avoids the avalanche effect and forces only a minimum number of processes to take their checkpoints on the stable storage.

UR - http://www.scopus.com/inward/record.url?scp=0037413288&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0037413288&partnerID=8YFLogxK

U2 - 10.1016/S0304-3975(02)00566-2

DO - 10.1016/S0304-3975(02)00566-2

M3 - Article

AN - SCOPUS:0037413288

VL - 290

SP - 1127

EP - 1148

JO - Theoretical Computer Science

JF - Theoretical Computer Science

SN - 0304-3975

IS - 2

ER -