Key-value stores form an integral infrastructural component of numerous modern web-based applications including retail stores, multi-player games, reservation systems, news feeds, and social and professional networks. Cloud computing service providers commonly implement key-value stores over large scale distributed data storage systems. At the heart of key-value store implementations in distributed storage systems, there are carefully crafted algorithms that expose a consistent, current view of the stored data to a user who reads the data. The main purpose of this project is to undertake a formal study of the storage costs incurred in distributed storage systems which aspire to present a consistent, current view of the stored data. The project has the long-term potential to aid the development of new data storage techniques that can benefit key-value store implementations by reducing their storage cost and energy consumption.
An important requirement of a distributed data storage system is fault tolerance, that is, the data must be accessible even if the system components fail. In applications of distributed storage to distributed computing and implementation of key-value stores, the following property known as consistency is also critical: when the data is being constantly updated, a user that reads from the system should obtain the latest version of the data. Algorithms that ensure consistency and fault tolerance in storage systems have been extensively studied in distributed systems theory and practice. The goal of this project is to obtain, for the first time, an information theoretic understanding of the storage costs incurred in consistent, fault-tolerant distributed storage systems.
Building on preliminary work by the investigator, the project will develop and study several new information theoretic formulations inspired by distributed systems theory and practice. The proposed formulations naturally expose trade-offs between the degrees of redundancy and consistency, and other physical parameters of storage systems. New coding schemes and information theoretic converses for the proposed formulations will be developed using tools from algebra, combinatorics and network information theory. The project will also pursue the development of new bounds for network coding, which naturally apply to the newly proposed formulations and to other families of codes including locally repairable codes and regenerating codes. The project will likewise develop an education plan that eyes the long term goal of developing interdisciplinary researchers and engineers who are trained in information theory, coding theory, and the theory and design of distributed systems.
|Effective start/end date||2/15/16 → 1/31/23|
- National Science Foundation: $497,847.00