TY - JOUR

T1 - Subset Source Coding

AU - MolavianJazi, Ebrahim

AU - Yener, Aylin

N1 - Funding Information:
Manuscript received June 6, 2016; revised September 9, 2017; accepted June 5, 2018. Date of publication July 9, 2018; date of current version August 16, 2018. This work was supported by the U.S. Army Research Laboratory through the Network Science Collaborative Technology Alliance under Grant W911NF-09-2-0053. This paper was presented in part at the 2015 Allerton Conference on Communications, Controls, and Computing and in part at the 2016 Information Theory and Applications Workshop.

PY - 2018/9

Y1 - 2018/9

N2 - This paper studies the fundamental limits of storage for structured data, where statistics and structure are both critical to the application. Accordingly, a framework is proposed for optimal lossless and lossy compression of subsets of the possible realizations of a discrete memoryless source (DMS). For the lossless subset-compression problem, it turns out that the optimal source code may not index the conventional source-typical sequences, but rather index certain subset-typical sequences consistent with the subset of interest. Building upon an achievability and a strong converse, an analytic expression is given, based on the Shannon entropy, relative entropy, and subset entropy, which identifies such subset-typical sequences for a broad class of subsets of a DMS. Intuitively, subset-typical sequences belong to those typical sets which highly intersect the subset of interest but are still closest to the source distribution in the sense of relative entropy. For the lossy subset-compression problem, an upper bound is derived on the subset rate-distortion function in terms of the subset mutual information optimized over the set of conditional distributions that satisfy the expected distortion constraint with respect to the subset-typical distribution and over a set of certain auxiliary subsets. By proving a strong converse result, this upper bound is shown to be tight for a class of symmetric subsets. As shown in our numerical examples, more often than not, one achieves a gain in the fundamental limits, in that the optimal compression rate for the subset in both the lossless and lossy settings can be strictly smaller than the source entropy and the source rate-distortion function, respectively, although exceptions are also possible.

AB - This paper studies the fundamental limits of storage for structured data, where statistics and structure are both critical to the application. Accordingly, a framework is proposed for optimal lossless and lossy compression of subsets of the possible realizations of a discrete memoryless source (DMS). For the lossless subset-compression problem, it turns out that the optimal source code may not index the conventional source-typical sequences, but rather index certain subset-typical sequences consistent with the subset of interest. Building upon an achievability and a strong converse, an analytic expression is given, based on the Shannon entropy, relative entropy, and subset entropy, which identifies such subset-typical sequences for a broad class of subsets of a DMS. Intuitively, subset-typical sequences belong to those typical sets which highly intersect the subset of interest but are still closest to the source distribution in the sense of relative entropy. For the lossy subset-compression problem, an upper bound is derived on the subset rate-distortion function in terms of the subset mutual information optimized over the set of conditional distributions that satisfy the expected distortion constraint with respect to the subset-typical distribution and over a set of certain auxiliary subsets. By proving a strong converse result, this upper bound is shown to be tight for a class of symmetric subsets. As shown in our numerical examples, more often than not, one achieves a gain in the fundamental limits, in that the optimal compression rate for the subset in both the lossless and lossy settings can be strictly smaller than the source entropy and the source rate-distortion function, respectively, although exceptions are also possible.

UR - http://www.scopus.com/inward/record.url?scp=85049787426&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049787426&partnerID=8YFLogxK

U2 - 10.1109/TIT.2018.2854601

DO - 10.1109/TIT.2018.2854601

M3 - Article

AN - SCOPUS:85049787426

VL - 64

SP - 5989

EP - 6012

JO - IEEE Transactions on Information Theory

JF - IEEE Transactions on Information Theory

SN - 0018-9448

IS - 9

M1 - 8408809

ER -