TY - JOUR
T1 - Harnessing Correlations in Distributed Erasure-Coded Key-Value Stores
AU - Ali, Ramy E.
AU - Cadambe, Viveck R.
N1 - Funding Information:
This work is supported by NSF grant No. CCF 1553248.
Funding Information:
Manuscript received October 2, 2018; revised March 9, 2019 and May 17, 2019; accepted June 6, 2019. Date of publication June 17, 2019; date of current version September 16, 2019. This work is supported by NSF grant No. CCF 1553248. This paper was presented in part at the Proceedings of the 2016 IEEE Information Theory Workshop [1]. The associate editor coordinating the review of this paper and approving it for publication was R. F. Schaefer. (Corresponding author: Ramy E. Ali.) The authors are with the School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA 16802 USA (e-mail: ramy.ali@psu.edu; viveck@engr.psu.edu).
Publisher Copyright:
© 1972-2012 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - Motivated by applications of distributed storage systems to key-value stores, the multi-version coding problem has been formulated to efficiently store frequently updated data in asynchronous decentralized storage systems. Inspired by consistency requirements in distributed systems, the main goal in the multi-version coding problem is to ensure that the latest possible version of the data is decodable even if the data updates have not reached all the servers in the system. In this paper, we study the storage cost of ensuring consistency for the case where the data versions are correlated, in contrast to previous work where the data versions were treated as being independent. We provide multi-version code constructions that show that the storage cost can be significantly smaller than the previous constructions depending on the degree of correlation, despite the asynchrony and the decentralized nature. Our achievability results are based on Reed-Solomon codes and random binning. Through an information-theoretic converse, we show that our multi-version codes are asymptotically nearly optimal, within a factor of 2, in certain interesting regimes.
AB - Motivated by applications of distributed storage systems to key-value stores, the multi-version coding problem has been formulated to efficiently store frequently updated data in asynchronous decentralized storage systems. Inspired by consistency requirements in distributed systems, the main goal in the multi-version coding problem is to ensure that the latest possible version of the data is decodable even if the data updates have not reached all the servers in the system. In this paper, we study the storage cost of ensuring consistency for the case where the data versions are correlated, in contrast to previous work where the data versions were treated as being independent. We provide multi-version code constructions that show that the storage cost can be significantly smaller than the previous constructions depending on the degree of correlation, despite the asynchrony and the decentralized nature. Our achievability results are based on Reed-Solomon codes and random binning. Through an information-theoretic converse, we show that our multi-version codes are asymptotically nearly optimal, within a factor of 2, in certain interesting regimes.
UR - http://www.scopus.com/inward/record.url?scp=85077438522&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077438522&partnerID=8YFLogxK
U2 - 10.1109/TCOMM.2019.2923616
DO - 10.1109/TCOMM.2019.2923616
M3 - Article
AN - SCOPUS:85077438522
VL - 67
SP - 5907
EP - 5920
JO - IEEE Transactions on Communications
JF - IEEE Transactions on Communications
SN - 1558-0857
IS - 9
M1 - 8737969
ER -