Modern Compression | �鶹��ýAV Information Theory Society

2022

Modern Compression

Guest editors

Meir Feder

Tsachy Weissman

Modern computation environments are struggling to store, communicate and process data in unprecedented volumes. These data, which come in new and evolving structures and formats, necessitate compression, lossless and lossy. Recent years have witnessed the emergence of new techniques, approaches, and modes for data compression. This special issue will focus on cutting edge research in this space.

Meir Feder Tsachy Weissman

Modern computation and networking environments are struggling to store, communicate and process data in unprecedented volumes. These data, which come in new and evolving structures and formats, necessitate compression, lossless and lossy. Recent years have witnessed the emergence of new techniques, approaches, architectures and modes for data compression. This special issue is dedicated to cutting-edge research geared toward providing information theoretic insight into this space.

Daniel Severo James Townsend Ashish Khisti Alireza Makhzani Karen Ullrich

Current methods which compress multisets at an optimal rate have computational complexity that scales linearly with alphabet size, making them too slow to be practical in many real-world settings. We show how to convert a compression algorithm for sequences into one for multisets, in exchange for an additional complexity term that is quasi-linear in sequence length. This allows us to compress multisets of exchangeable symbols at an optimal rate, with computational complexity decoupled from the alphabet size. The key insight is to avoid encoding the multiset directly, and instead compress a proxy sequence, using a technique called ‘bits-back coding’. We demonstrate the method experimentally on tasks which are intractable with previous optimal-rate methods: compression of multisets of images and JavaScript Object Notation (JSON) files. Code for our experiments is available at https://github.com/facebookresearch/multiset-compression.

Kengo Hashimoto Ken-ichi Iwata

For a given independent and identically distributed (i.i.d.) source, Huffman code achieves the optimal average codeword length in the class of instantaneous codes with a single code table. However, it is known that there exist time-variant encoders, which achieve a shorter average codeword length than the Huffman code, using multiple code tables and allowing at most $k$ -bit decoding delay for $k = 2, 3, 4, \ldots $ . On the other hand, it is not known whether there exists a 1-bit delay decodable code, which achieves a shorter average length than the Huffman code. This paper proves that for a given i.i.d. source, a Huffman code achieves the optimal average codeword length in the class of 1-bit delay decodable codes with a finite number of code tables.

Maryam Hosseini Narayana Santhanam

We formalize the tail redundancy of a collection ${\mathcal{ P}}$ of distributions over a countably infinite alphabet, and show that this fundamental quantity characterizes the asymptotic per-symbol minimax redundancy of universally compressing sequences generated i.i.d. from ${\mathcal{ P}}$ . Contrary to the worst case formulations of universal compression, finite single letter minimax (average case) redundancy of ${\mathcal{ P}}$ does not automatically imply that the expected minimax redundancy of describing length- $n$ strings sampled i.i.d. from ${\mathcal{ P}}$ grows sublinearly with $n$ . Instead, we prove that universal compression of length- $n$ i.i.d. sequences from ${\mathcal{ P}}$ is characterized by how well the tails of distributions in ${\mathcal{ P}}$ can be universally described, showing that the asymptotic per-symbol redundancy of i.i.d. strings is equal to the tail redundancy.

Shubham Jha

We consider universal quantization with side information for Gaussian observations, where the side information is a noisy version of the sender’s observation with noise variance unknown to the sender. In this paper, we propose a universally rate optimal and practical quantization scheme for all values of unknown noise variance. Our scheme uses Polar lattices from prior work, and proceeds based on a structural decomposition of the underlying auxiliaries so that even when recovery fails in a round, the parties agree on a common “reference point” that is closer than the previous one. We also present the finite blocklength analysis showing an sub-exponential convergence for distortion and exponential convergence for rate. The overall complexity of our scheme is $\mathcal {O}(N^{2}\log ^{2} N)$ for any target distortion and fixed rate larger than the rate-distortion bound.

Aviv Adler Jennifer Tang Yury Polyanskiy

A number of engineering and scientific problems require representing and manipulating probability distributions over large alphabets, which we may think of as long vectors of reals summing to 1. In some cases it is required to represent such a vector with only $b$ bits per entry. A natural choice is to partition the interval $[{0,1}]$ into $2^{b}$ uniform bins and quantize entries to each bin independently. We show that a minor modification of this procedure– applying an entrywise non-linear function (compander) $f(x)$ prior to quantization– yields an extremely effective quantization method. For example, for $b=8 (16)$ and $10^{5}$ -sized alphabets, the quality of representation improves from a loss (under KL divergence) of $0.5 (0.1)$ bits/entry to $10^{-4} (10^{-9})$ bits/entry. Compared to floating point representations, our compander method improves the loss from $10^{-1}(10^{-6})$ to $10^{-4}(10^{-9})$ bits/entry. These numbers hold for both real-world data (word frequencies in books and DNA $k$ -mer counts) and for synthetic randomly generated distributions. Theoretically, we analyze a minimax optimality criterion and show that the closed-form compander $f(x) \propto {\mathrm {ArcSinh}}(\sqrt {c_{K} ({K} \log {K}) x})$ is (asymptotically as $b\to \infty$ ) optimal for quantizing probability distributions over a ${K}$ -letter alphabet. Non-asymptotically, such a compander (substituting $1/2$ for $c_{K}$ for simplicity) has KL-quantization loss bounded by $\leq 8\cdot 2^{-2b} \log ^{2} {K}$ . Interestingly, a similar minimax criterion for the quadratic loss on the hypercube shows optimality of the standard uniform quantizer. This suggests that the ArcSinh quantizer is as fundamental for KL-distortion as the uniform quantizer for quadratic distortion.

Jun Chen Lei Yu Jia Wang Wuxian Shi Yiqun Ge Wen Tong

Rate-distortion-perception theory extends Shannon’s rate-distortion theory by introducing a constraint on the perceptual quality of the output. The perception constraint complements the conventional distortion constraint and aims to enforce distribution-level consistencies. In this new theory, the information-theoretic limit is characterized by the rate-distortion-perception function. Although a coding theorem for the rate-distortion-perception function has recently been established, the fundamental nature of the optimal coding schemes remains unclear, especially regarding the role of randomness in encoding and decoding. It is shown in the present work that except for certain extreme cases, the rate-distortion-perception function is achievable by deterministic codes. This paper also clarifies the subtle differences between two notions of perfect perceptual quality and explores some alternative formulations of the perception constraint.

Eric Lei Hamed Hassani Shirin Saeedi Bidokhti

A fundamental question in designing lossy data compression schemes is how well one can do in comparison with the rate-distortion function, which describes the known theoretical limits of lossy compression. Motivated by the empirical success of deep neural network (DNN) compressors on large, real-world data, we investigate methods to estimate the rate-distortion function on such data, which would allow comparison of DNN compressors with optimality. While one could use the empirical distribution of the data and apply the Blahut-Arimoto algorithm, this approach presents several computational challenges and inaccuracies when the datasets are large and high-dimensional, such as the case of modern image datasets. Instead, we re-formulate the rate-distortion objective, and solve the resulting functional optimization problem using neural networks. We apply the resulting rate-distortion estimator, called NERD, on popular image datasets, and provide evidence that NERD can accurately estimate the rate-distortion function. Using our estimate, we show that the rate-distortion achievable by DNN compressors are within several bits of the rate-distortion function for real-world datasets. Additionally, NERD provides access to the rate-distortion achieving channel, as well as samples from its output marginal. Therefore, using recent results in reverse channel coding, we describe how NERD can be used to construct an operational one-shot lossy compression scheme with guarantees on the achievable rate and distortion. Experimental results demonstrate competitive performance with DNN compressors.

Nirmal V. Shende Aaron B. Wagner

We introduce a new distortion measure for point processes called functional-covering distortion. It is inspired by intensity theory and is related to both the covering of point processes and logarithmic-loss distortion. We obtain the distortion-rate function with feedforward under this distortion measure for a large class of point processes. For Poisson processes, the rate-distortion function is obtained under a general condition called constrained functional-covering distortion, of which both covering and functional-covering are special cases. Also for Poisson processes, we characterize the rate-distortion region for a two-encoder CEO problem and show that feedforward does not enlarge this region.

Rony Bou Rouphael Maël Le Treust

In decentralized and decision-oriented communication paradigms, autonomous devices strategically implement information compression policies. In this work, we study a strategic communication game between an encoder and two decoders. An i.i.d. information source, observed by the encoder, is transmitted to the decoders via two perfect links, one reaching the first decoder only and the other reaching both decoders, as in the successive refinement setup. All three communicating devices are assumed to be rational, i.e., they want to minimize their respective cost functions, that depend on the source variable and the output symbols of both decoder. The game takes place as follows: the encoder commits to implementing an encoding strategy which induces a Bayesian game among the two decoders. The encoder is the Stackelberg leader and the two decoders are the Stackelberg followers, they select simultaneously the output sequences that minimize their respective long-run costs. We characterize the asymptotic behavior of the long-run optimal cost of the encoder, when the decoders implement decoding strategies that form a Bayes-Nash equilibrium. We show that this optimal cost converges to a single-letter expression which involves two auxiliary random variables and single-letter incentive constraints of the decoders.

Shashank Vatedka Venkat Chandar Aslan Tchamkerten

A recent result says that the lossless compression of a single source $X^{n}$ is achievable with a strong locality property; any $X_{i}$ can be decoded from a constant number of compressed bits, with a vanishing in $n$ probability of error. By contrast, we show that for two separately encoded sources $(X^{n},Y^{n})$ , lossless compression and strong locality is generally not possible. Specifically, we show that for the class of “confusable” sources, strong locality cannot be achieved whenever one of the sources is compressed below its entropy. Irrespective of $n$ , there always exists at least one index $i$ for which the probability of incorrectly decoding $(X_{i},Y_{i})$ is lower bounded by $2^{-O( \mathtt {d})}$ , where $ \mathtt {d}$ denotes the worst-case (over indices) number of compressed bits accessed by the local decoder. Conversely, if the source is not confusable, strong locality is possible even if one of the sources is compressed below its entropy. Results extend to an arbitrary number of sources.

Tze-Yang Tung David Burth Kurka Mikolaj Jankowski Deniz Gündüz

Recent works have shown that modern machine learning techniques can provide an alternative approach to the long-standing joint source-channel coding (JSCC) problem. Very promising initial results, superior to popular digital schemes that utilize separate source and channel codes, have been demonstrated for wireless image and video transmission using deep neural networks (DNNs). However, end-to-end training of such schemes requires a differentiable channel input representation; hence, prior works have assumed that any complex value can be transmitted over the channel. This can prevent the application of these codes in scenarios where the hardware or protocol can only admit certain sets of channel inputs, prescribed by a digital constellation. Herein, we propose DeepJSCC-Q, an end-to-end optimized JSCC solution for wireless image transmission using a finite channel input alphabet. We show that DeepJSCC-Q can achieve similar performance to prior works that allow any complex valued channel input, especially when high modulation orders are available, and that the performance asymptotically approaches that of unconstrained channel input as the modulation order increases. Importantly, DeepJSCC-Q preserves the graceful degradation of image quality in unpredictable channel conditions, a desirable property for deployment in mobile systems with rapidly changing channel conditions.

Sourya Basu Lav R. Varshney

Deep neural networks have shown incredible performance for inference tasks in a variety of domains, but require significant storage space, which limits scaling and use for on-device intelligence. This paper is concerned with finding universal lossless compressed representations of deep feedforward networks with synaptic weights drawn from discrete sets, and directly performing inference without full decompression. The basic insight that allows less rate than naïve approaches is recognizing that the bipartite graph layers of feedforward networks have a kind of permutation invariance to the labeling of nodes, in terms of inferential operation. We provide efficient algorithms to dissipate this irrelevant uncertainty and then use arithmetic coding to nearly achieve the entropy bound in a universal manner. We also provide experimental results of our approach on several standard datasets.

Payam Delgosha Venkat Anantharam

Many modern applications involve accessing and processing graphical data, i.e., data that is naturally indexed by graphs. Examples come from Internet graphs, social networks, genomics and proteomics, and other sources. The typically large size of such data motivates seeking efficient ways for its compression and decompression. The current compression methods are usually tailored to specific models, or do not provide theoretical guarantees. In this paper, we introduce a low–complexity lossless compression algorithm for sparse marked graphs, i.e., graphical data indexed by sparse graphs, which is capable of universally achieving the optimal compression rate defined on a per–node basis. The time and memory complexities of our compression and decompression algorithms are optimal within logarithmic factors. In order to define universality we employ the framework of local weak convergence, which allows one to make sense of a notion of stochastic processes for sparse graphs. Moreover, we investigate the performance of our algorithm through some experimental results on both synthetic and real–world data.

Travis C. Cuvelier Takashi Tanaka Robert W. Heath

Motivated by control with communication constraints, in this work we develop a time-invariant data compression architecture for linear-quadratic-Gaussian (LQG) control with minimum bitrate prefix-free feedback. For any fixed control performance, the approach we propose nearly achieves known directed information (DI) lower bounds on the time-average expected codeword length. We refine the analysis of a classical achievability approach, which required quantized plant measurements to be encoded via a time-varying lossless source code. We prove that the sequence of random variables describing the quantizations has a limiting distribution and that the quantizations may be encoded with a fixed source code optimized for this distribution without added time-asymptotic redundancy. Our result follows from analyzing the long-term stochastic behavior of the system, and permits us to additionally guarantee that the time-average codeword length (as opposed to expected length) is almost surely within a few bits of the minimum DI. To our knowledge, this time-invariant achievability result is the first in the literature.

Osama A. Hanna Lin F. Yang Christina Fragouli

The multi-armed bandit (MAB) problem is one of the most well-known active learning frameworks. The aim is to select the best among a set of actions by sequentially observing rewards that come from an unknown distribution. Recently, a number of distributed bandit applications have become popular over wireless networks, where agents geographically separated from a learner collect and communicate the observed rewards. In this paper we propose a compression scheme, that compresses the rewards collected by the distributed agents. By providing nearly matching upper and lower bounds, we tightly characterize the number of bits needed per reward for the learner to accurately learn without suffering additional regret. In particular, we establish a generic reward quantization algorithm, $QuBan$ , that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart. $QuBan$ requires only a few (converging to as low as 3 bits as the number of iterations increases) bits to be sent per reward while preserving the same regret bound as uncompressed rewards. Our lower bound is established via constructing hard instances from a subGaussian distribution. Our theory is further corroborated by numerical experiments.

Francesco Pase Deniz Gündüz Michele Zorzi

We consider a rate-constrained contextual multi-armed bandit (RC-CMAB) problem, in which a group of agents are solving the same contextual multi-armed bandit (CMAB) problem. However, the contexts are observed by a remotely connected entity, i.e., the decision-maker, that updates the policy to maximize the returned rewards, and communicates the arms to be sampled by the agents to a controller over a rate-limited communications channel. This framework can be applied to personalized ad placement, whenever the content owner observes the website visitors, and hence has the context, but needs to transmit the ads to be shown to a controller that is in charge of placing the marketing content. Consequently, the rate-constrained CMAB (RC-CMAB) problem requires the study of lossy compression schemes for the policy to be employed whenever the constraint on the channel rate does not allow the uncompressed transmission of the decision-maker’s intentions. We characterize the fundamental information theoretic limits of this problem by letting the number of agents go to infinity, and study the regret that can be achieved, identifying the two distinct rate regions leading to linear and sub-linear regrets respectively. We then analyze the optimal compression scheme achievable in the limit with infinite agents, when using the forward and reverse KL divergence as distortion metric. Based on this, we also propose a practical coding scheme, and provide numerical results.

Yucheng Liu Lawrence Ong Parastoo Sadeghi Sarah Johnson Joerg Kliewer Phee Lep Yeoh

Index coding can be viewed as a compression problem with multiple decoders with side information. In such a setup, an encoder compresses a number of messages into a common codeword such that every decoder can decode its requested messages with the help of knowing some other messages as side information. In this paper, we study how much information is leaked to a guessing adversary observing the codeword in index coding, where some messages in the system are sensitive and others are not. The non-sensitive messages can be used by the encoder in a manner similar to secret keys to mitigate leakage of the sensitive messages to the adversary. We first characterize the optimal information leakage rate of a given index coding problem by the optimal compression rate of a related problem, which is constructed by adding an extra decoder with certain parameters to the original problem. Both the achievability and converse of the characterization are derived from a graph-theoretic perspective based on confusion graphs (Alon et al. 2008). In particular, the achievable coding scheme is a randomized mapping exploiting certain symmetrical properties of the confusion graph. Our second main result is a practical deterministic linear coding scheme, developed from the rank minimization method based on fitting matrices (Bar-Yossef et al. 2011). The linear scheme leads to an upper bound on the optimal leakage rate, which is proved to be tight over all deterministic scalar linear codes. While it is shown through an example that simultaneously achieving optimal compression and leakage rates is not always possible, time-sharing between different schemes could be used to balance the compression and leakage rates. Finally, we show how our results can be applied to different variants of index coding.

Berivan Isik Tsachy Weissman

Storage-efficient privacy-preserving learning is crucial due to increasing amounts of sensitive user data required for modern learning tasks. We propose a framework for reducing the storage cost of user data while at the same time providing privacy guarantees, without essential loss in the utility of the data for learning. Our method comprises noise injection followed by lossy compression. We show that, when appropriately matching the lossy compression to the distribution of the added noise, the compressed examples converge, in distribution, to that of the noise-free training data as the sample size of the training data (or the dimension of the training data) increases. In this sense, the utility of the data for learning is essentially maintained, while reducing storage and privacy leakage by quantifiable amounts. We present experimental results on the CelebA dataset for gender classification and find that our suggested pipeline delivers in practice on the promise of the theory: the individuals in the images are unrecognizable (or less recognizable, depending on the noise level), overall storage of the data is substantially reduced, with no essential loss (and in some cases a slight boost) to the classification accuracy. As an added bonus, our experiments suggest that our method yields a substantial boost to robustness in the face of adversarial test data.

Sourya Basu Daewon Seo Lav R. Varshney

This work investigates functional source coding problems with maximal distortion, motivated by approximate function computation in many modern applications. The maximal distortion treats imprecise reconstruction of a function value as good as perfect computation if it deviates less than a tolerance level, while treating reconstruction that differs by more than that level as a failure. Using a geometric understanding of the maximal distortion, we propose a hypergraph-based source coding scheme for function computation that is constructive in the sense that it gives an explicit procedure for finding optimal or good auxiliary random variables. Moreover, we find that the hypergraph-based coding scheme achieves the optimal rate-distortion function in the setting of coding for computing with side information and achieves the Berger-Tung sum-rate inner bound in the setting of distributed source coding for computing. It also achieves the El Gamal-Cover inner bound for multiple description coding for computing and is optimal for successive refinement and cascade multiple description problems for computing. Lastly, the benefit of complexity reduction of finding a forward test channel is shown for a class of Markov sources.

�鶹��ýAV

Table of Contents