Selective checkpointing and rollbacks in multithreaded distributed systems

M. Kasbekar, Chitaranjan Das

Research output: Contribution to conferencePaper

8 Citations (Scopus)

Abstract

Modern distributed systems are often multithreaded and object-oriented in their design. They require efficient techniques to checkpoint and restore their state for improving fault-tolerance properties. The traditional process-based techniques of distributed checkpointing and rollback algorithms suffer from the problem of false dependencies, which makes them very rigid and inefficient for use with modern systems. In this paper, we develop protocols that can selectively checkpoint (and rollback) some threads of a distributed system, while leaving others untouched and yet ensuring the consistency of state resulting from such a partial rollback.

Original languageEnglish (US)
Pages39-46
Number of pages8
StatePublished - Jan 1 2001
Event21st IEEE International Conference on Distributed Computing Systems - Mesa, AZ, United States
Duration: Apr 16 2001Apr 19 2001

Other

Other21st IEEE International Conference on Distributed Computing Systems
CountryUnited States
CityMesa, AZ
Period4/16/014/19/01

Fingerprint

Fault tolerance

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Kasbekar, M., & Das, C. (2001). Selective checkpointing and rollbacks in multithreaded distributed systems. 39-46. Paper presented at 21st IEEE International Conference on Distributed Computing Systems, Mesa, AZ, United States.
Kasbekar, M. ; Das, Chitaranjan. / Selective checkpointing and rollbacks in multithreaded distributed systems. Paper presented at 21st IEEE International Conference on Distributed Computing Systems, Mesa, AZ, United States.8 p.
@conference{6e04a3991a6248d682f413732ddd50b4,
title = "Selective checkpointing and rollbacks in multithreaded distributed systems",
abstract = "Modern distributed systems are often multithreaded and object-oriented in their design. They require efficient techniques to checkpoint and restore their state for improving fault-tolerance properties. The traditional process-based techniques of distributed checkpointing and rollback algorithms suffer from the problem of false dependencies, which makes them very rigid and inefficient for use with modern systems. In this paper, we develop protocols that can selectively checkpoint (and rollback) some threads of a distributed system, while leaving others untouched and yet ensuring the consistency of state resulting from such a partial rollback.",
author = "M. Kasbekar and Chitaranjan Das",
year = "2001",
month = "1",
day = "1",
language = "English (US)",
pages = "39--46",
note = "21st IEEE International Conference on Distributed Computing Systems ; Conference date: 16-04-2001 Through 19-04-2001",

}

Kasbekar, M & Das, C 2001, 'Selective checkpointing and rollbacks in multithreaded distributed systems', Paper presented at 21st IEEE International Conference on Distributed Computing Systems, Mesa, AZ, United States, 4/16/01 - 4/19/01 pp. 39-46.

Selective checkpointing and rollbacks in multithreaded distributed systems. / Kasbekar, M.; Das, Chitaranjan.

2001. 39-46 Paper presented at 21st IEEE International Conference on Distributed Computing Systems, Mesa, AZ, United States.

Research output: Contribution to conferencePaper

TY - CONF

T1 - Selective checkpointing and rollbacks in multithreaded distributed systems

AU - Kasbekar, M.

AU - Das, Chitaranjan

PY - 2001/1/1

Y1 - 2001/1/1

N2 - Modern distributed systems are often multithreaded and object-oriented in their design. They require efficient techniques to checkpoint and restore their state for improving fault-tolerance properties. The traditional process-based techniques of distributed checkpointing and rollback algorithms suffer from the problem of false dependencies, which makes them very rigid and inefficient for use with modern systems. In this paper, we develop protocols that can selectively checkpoint (and rollback) some threads of a distributed system, while leaving others untouched and yet ensuring the consistency of state resulting from such a partial rollback.

AB - Modern distributed systems are often multithreaded and object-oriented in their design. They require efficient techniques to checkpoint and restore their state for improving fault-tolerance properties. The traditional process-based techniques of distributed checkpointing and rollback algorithms suffer from the problem of false dependencies, which makes them very rigid and inefficient for use with modern systems. In this paper, we develop protocols that can selectively checkpoint (and rollback) some threads of a distributed system, while leaving others untouched and yet ensuring the consistency of state resulting from such a partial rollback.

UR - http://www.scopus.com/inward/record.url?scp=0034998071&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034998071&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:0034998071

SP - 39

EP - 46

ER -

Kasbekar M, Das C. Selective checkpointing and rollbacks in multithreaded distributed systems. 2001. Paper presented at 21st IEEE International Conference on Distributed Computing Systems, Mesa, AZ, United States.