Cross-failure bug detection in persistent memory programs

Sihang Liu, Korakit Seemakhupt, Yizhou Wei, Thomas Wenisch, Aasheesh Kolli, Samira Khan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Persistent memory (PM) technologies, such as Intel's Optane memory, deliver high performance, byte-addressability, and persistence, allowing programs to directly manipulate persistent data in memory without any OS intermediaries. An important requirement of these programs is that persistent data must remain consistent across a failure, which we refer to as the crash consistency guarantee. However, maintaining crash consistency is not trivial. We identify that a consistent recovery critically depends not only on the execution before the failure, but also on the recovery and resumption after failure. We refer to these stages as the pre- and post-failure execution stages. In order to holistically detect crash consistency bugs, we categorize the underlying causes behind inconsistent recovery due to incorrect interactions between the pre- and post-failure execution. First, a program is not crash-consistent if the post-failure stage reads from locations that are not guaranteed to be persisted in all possible access interleavings during the pre-failure stage - a type of programming error that leads to a race that we refer to as a cross-failure race. Second, a program is not crash-consistent if the post-failure stage reads persistent data that has been left semantically inconsistent during the pre-failure stage, such as a stale log or uncommitted data. We refer to this type of bugs as a cross-failure semantic bug. Together, they form the cross-failure bugs in PM programs. In this work, we provide XFDetector, a tool that detects cross-failure bugs by automatically injecting failures into the pre-failure execution, and checking for cross-failure races and semantic bugs in the post-failure continuation. XFDetector has detected four new bugs in three pieces of PM software: one of PMDK's examples, a PM-optimized Redis database, and a PMDK library function.

Original languageEnglish (US)
Title of host publicationASPLOS 2020 - 25th International Conference on Architectural Support for Programming Languages and Operating Systems
PublisherAssociation for Computing Machinery
Pages1187-1202
Number of pages16
ISBN (Electronic)9781450371025
DOIs
StatePublished - Mar 9 2020
Event25th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2020 - Lausanne, Switzerland
Duration: Mar 16 2020Mar 20 2020

Publication series

NameInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

Conference

Conference25th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2020
CountrySwitzerland
CityLausanne
Period3/16/203/20/20

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint Dive into the research topics of 'Cross-failure bug detection in persistent memory programs'. Together they form a unique fingerprint.

  • Cite this

    Liu, S., Seemakhupt, K., Wei, Y., Wenisch, T., Kolli, A., & Khan, S. (2020). Cross-failure bug detection in persistent memory programs. In ASPLOS 2020 - 25th International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 1187-1202). (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS). Association for Computing Machinery. https://doi.org/10.1145/3373376.3378452