SSD failures in datacenters: What? When? and Why?

Iyswarya Narayanan, Di Wang, Myeongjae Jeon, Bikash Sharma, Laura Caulfield, Anand Sivasubramaniam, Ben Cutler, Jie Liu, Badriddine Khessib, Kushagra Vaid

Research output: Chapter in Book/Report/Conference proceedingConference contribution

54 Scopus citations

Abstract

Despite the growing popularity of Solid State Disks (SSDs) in the datacenter, little is known about their reliability characteristics in the field. The little knowledge is mainly vendor supplied, and such information cannot really help understand how SSD failures can manifest and impact the operation of production systems, in order to take appropriate remedial measures. Besides actual failure data and the symptoms exhibited by SSDs before failing, a detailed characterization effort requires wide set of data about factors influencing SSD failures, right from provisioning factors to the operational ones. This paper presents an extensive SSD failure characterization by analyzing a wide spectrum of data from over half a million SSDs that span multiple generations spread across several datacenters which host a wide spectrum of workloads over nearly 3 years. By studying the diverse set of design, provisioning and operational factors on failures, and their symptoms, our work provides the first comprehensive analysis of the what, when and why characteristics of SSD failures in production datacenters.

Original languageEnglish (US)
Title of host publicationSYSTOR 2016 - Proceedings of the 9th ACM International Systems and Storage Conference
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450343817
DOIs
StatePublished - Jun 6 2016
Event9th ACM International Systems and Storage Conference, SYSTOR 2016 - Haifa, Israel
Duration: Jun 6 2016Jun 8 2016

Publication series

NameSYSTOR 2016 - Proceedings of the 9th ACM International Systems and Storage Conference

Other

Other9th ACM International Systems and Storage Conference, SYSTOR 2016
CountryIsrael
CityHaifa
Period6/6/166/8/16

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Electrical and Electronic Engineering
  • Hardware and Architecture
  • Software

Fingerprint Dive into the research topics of 'SSD failures in datacenters: What? When? and Why?'. Together they form a unique fingerprint.

  • Cite this

    Narayanan, I., Wang, D., Jeon, M., Sharma, B., Caulfield, L., Sivasubramaniam, A., Cutler, B., Liu, J., Khessib, B., & Vaid, K. (2016). SSD failures in datacenters: What? When? and Why? In SYSTOR 2016 - Proceedings of the 9th ACM International Systems and Storage Conference [2928278] (SYSTOR 2016 - Proceedings of the 9th ACM International Systems and Storage Conference). Association for Computing Machinery, Inc. https://doi.org/10.1145/2928275.2928278