### Abstract

Emerging applications including semantic information processing impose priorities on the possible realizations of information sources, so that not all source sequences are important. This paper proposes an initial framework for optimal lossless compression of subsets of the output of a discrete memoryless source (DMS). It turns out that, the optimal source code may not index the conventional source-typical sequences, but rather index certain subset-typical sequences determined by the source statistics as well as the subset structure. Building upon an achievability and a strong converse, an analytic expression is given, based on the Shannon entropy, relative entropy, and subset entropy, which identifies such subset-typical sequences for a broad class of subsets of a DMS. Interestingly, one often achieves a gain in the fundamental limit, in that the optimal compression rate for the subset can be strictly smaller than the source entropy, although this is not always the case.

Original language | English (US) |
---|---|

Title of host publication | 2015 53rd Annual Allerton Conference on Communication, Control, and Computing, Allerton 2015 |

Publisher | Institute of Electrical and Electronics Engineers Inc. |

Pages | 857-864 |

Number of pages | 8 |

ISBN (Electronic) | 9781509018239 |

DOIs | |

State | Published - Apr 4 2016 |

Event | 53rd Annual Allerton Conference on Communication, Control, and Computing, Allerton 2015 - Monticello, United States Duration: Sep 29 2015 → Oct 2 2015 |

### Other

Other | 53rd Annual Allerton Conference on Communication, Control, and Computing, Allerton 2015 |
---|---|

Country | United States |

City | Monticello |

Period | 9/29/15 → 10/2/15 |

### Fingerprint

### All Science Journal Classification (ASJC) codes

- Computer Networks and Communications
- Computer Science Applications
- Control and Systems Engineering

### Cite this

*2015 53rd Annual Allerton Conference on Communication, Control, and Computing, Allerton 2015*(pp. 857-864). [7447096] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ALLERTON.2015.7447096

}

*2015 53rd Annual Allerton Conference on Communication, Control, and Computing, Allerton 2015.*, 7447096, Institute of Electrical and Electronics Engineers Inc., pp. 857-864, 53rd Annual Allerton Conference on Communication, Control, and Computing, Allerton 2015, Monticello, United States, 9/29/15. https://doi.org/10.1109/ALLERTON.2015.7447096

**Subset source coding.** / Molavianjazi, Ebrahim; Yener, Aylin.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Subset source coding

AU - Molavianjazi, Ebrahim

AU - Yener, Aylin

PY - 2016/4/4

Y1 - 2016/4/4

N2 - Emerging applications including semantic information processing impose priorities on the possible realizations of information sources, so that not all source sequences are important. This paper proposes an initial framework for optimal lossless compression of subsets of the output of a discrete memoryless source (DMS). It turns out that, the optimal source code may not index the conventional source-typical sequences, but rather index certain subset-typical sequences determined by the source statistics as well as the subset structure. Building upon an achievability and a strong converse, an analytic expression is given, based on the Shannon entropy, relative entropy, and subset entropy, which identifies such subset-typical sequences for a broad class of subsets of a DMS. Interestingly, one often achieves a gain in the fundamental limit, in that the optimal compression rate for the subset can be strictly smaller than the source entropy, although this is not always the case.

AB - Emerging applications including semantic information processing impose priorities on the possible realizations of information sources, so that not all source sequences are important. This paper proposes an initial framework for optimal lossless compression of subsets of the output of a discrete memoryless source (DMS). It turns out that, the optimal source code may not index the conventional source-typical sequences, but rather index certain subset-typical sequences determined by the source statistics as well as the subset structure. Building upon an achievability and a strong converse, an analytic expression is given, based on the Shannon entropy, relative entropy, and subset entropy, which identifies such subset-typical sequences for a broad class of subsets of a DMS. Interestingly, one often achieves a gain in the fundamental limit, in that the optimal compression rate for the subset can be strictly smaller than the source entropy, although this is not always the case.

UR - http://www.scopus.com/inward/record.url?scp=84969754425&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84969754425&partnerID=8YFLogxK

U2 - 10.1109/ALLERTON.2015.7447096

DO - 10.1109/ALLERTON.2015.7447096

M3 - Conference contribution

SP - 857

EP - 864

BT - 2015 53rd Annual Allerton Conference on Communication, Control, and Computing, Allerton 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -