Â鶹´«Ă˝AV

Cover art for June 2022 JSAIT issue showing variable coded batch matrix multiplication
2022
Distributed Coding and Computation
Guest editors
Mohammad Ali Maddah-Ali
Salman Avestimehr
Ravi Tandon
Changho Suh
Ayfer Ozgur
Chao Tian
Tara Javidi
Giuseppe Caire

This special issue will focus on information theoretic aspects of distributed coding and computing. While applications and platforms such as distributed learning, cloud storage and computing, content delivery networks, and distributed ledgers are increasingly popular, there is a tremendous need to evaluate the fundamental limits of the existing solutions and develop efficient approaches to run them. This is particularly important considering the growing list of constrains and requirements in terms available resources, scalability, privacy, security, fault tolerance, speed, accuracy, and verifiability. In this context, information theory and coding can play a major role in expanding and employing various tools and techniques to deal with those challenging tasks. This special issue aims to attract contributions investigating the fundamental limits of distributed information systems and developing efficient coding techniques to meet those limits, satisfying the essential constraints.

Farzan Farnia    Amirhossein Reisizadeh    Ramtin Pedarsani    Ali Jadbabaie

Federated learning is a distributed machine learning paradigm, which aims to train a model using the local data of many distributed clients. A key challenge in federated learning is that the data samples across the clients may not be identically distributed. To address this challenge, personalized federated learning with the goal of tailoring the learned model to the data distribution of every individual client has been proposed. In this paper, we focus on this problem and propose a novel personalized Federated Learning scheme based on Optimal Transport (FedOT) as a learning algorithm that learns the optimal transport maps for transferring data points to a common distribution as well as the prediction model under the applied transport map. To formulate the FedOT problem, we extend the standard optimal transport task between two probability distributions to multi-marginal optimal transport problems with the goal of transporting samples from multiple distributions to a common probability domain. We then leverage the results on multi-marginal optimal transport problems to formulate FedOT as a min-max optimization problem and analyze its generalization and optimization properties. We discuss the results of several numerical experiments to evaluate the performance of FedOT under heterogeneous data distributions in federated learning problems.

Xinghan Wang    Anusha Lalitha    Tara Javidi    Farinaz Koushanfar

This paper proposes a federated supervised learning framework over a general peer-to-peer network with agents that act in a variational Bayesian fashion. The proposed framework consists of local agents where each of which keeps a local “posterior probability distribution” over the parameters of a global model; the updating of the posterior over time happens in a local fashion according to two subroutines of: 1) variational model training given (a batch of) local labeled data, and 2) asynchronous communication and model aggregation with the 1-hop neighbors. Inspired by the popular federated learning (model averaging), the framework allows the training data to remain distributed on mobile devices while utilizing a peer-to-peer model aggregation in a social network. The proposed framework is shown to allow for a systematic treatment of model aggregation over any arbitrary connected graph with consistent (in general, non-iid) local labeled data. Specifically, under mild technical conditions, the proposed algorithm allows agents with local data to learn a shared model explaining the global training data in a decentralized fashion over an arbitrary peering/connectivity graph. Furthermore, the rate of convergence is characterized and shown to be a function of each individual agent’s data quality weighted by its eigenvector centrality. Empirically, the proposed methodology is shown to work well with efficient variation Bayesian inference techniques to train Bayesian neural networks in a decentralized manner even when the local data batches are not identically distributed.

Rajarshi Saha    Mert Pilanci    Andrea J. Goldsmith

We study first-order optimization algorithms under the constraint that the descent direction is quantized using a pre-specified budget of $R$ -bits per dimension, where $R \in (0,\infty)$ . We propose computationally efficient optimization algorithms with convergence rates matching the information-theoretic performance lower bounds for: (i) Smooth and Strongly-Convex objectives with access to an Exact Gradient oracle, as well as (ii) General Convex and Non-Smooth objectives with access to a Noisy Subgradient oracle. The crux of these algorithms is a polynomial complexity source coding scheme that embeds a vector into a random subspace before quantizing it. These embeddings are such that with high probability, their projection along any of the canonical directions of the transform space is small. As a consequence, quantizing these embeddings followed by an inverse transform to the original space yields a source coding method with optimal covering efficiency while utilizing just $R$ -bits per dimension. Our algorithms guarantee optimality for arbitrary values of the bit-budget $R$ , which includes both the sub-linear budget regime ( $R < 1$ ), as well as the high-budget regime ( $R \geq 1$ ), while requiring $O(n^{2})$ multiplications, where $n$ is the dimension. We also propose an efficient relaxation of this coding scheme using Hadamard subspaces that requires a near-linear time, i.e., $O(n \log n)$ additions. Furthermore, we show that the utility of our proposed embeddings can be extended to significantly improve the performance of gradient sparsification schemes. Numerical simulations validate our theoretical claims. Our implementations are available at https://github.com/rajarshisaha95/DistOptConstrComm.

Amirhossein Reisizadeh    Isidoros Tziotis    Hamed Hassani    Aryan Mokhtari    Ramtin Pedarsani

Federated learning is a novel paradigm that involves learning from data samples distributed across a large network of clients while the data remains local. It is, however, known that federated learning is prone to multiple system challenges including system heterogeneity where clients have different computation and communication capabilities. Such heterogeneity in clients’ computation speed has a negative effect on the scalability of federated learning algorithms and causes significant slow-down in their runtime due to slow devices (stragglers). In this paper, we propose FLANP, a novel straggler-resilient federated learning meta-algorithm that incorporates statistical characteristics of the clients’ data to adaptively select the clients in order to speed up the learning procedure. The key idea of FLANP is to start the training procedure with faster nodes and gradually involve the slower ones in the model training once the statistical accuracy of the current participating nodes’ data is reached, while the final model for each stage is used as a warm-start model for the next stage. Our theoretical results characterize the speedup provided by the meta-algorithm FLANP in comparison to standard federated benchmarks for strongly convex losses and i.i.d. samples. For particular instances, FLANP slashes the overall expected runtime by a factor of $\mathcal {O}(\ln (Ns))$ , where $N$ and $s$ denote the total number of nodes and the number of samples per node, respectively. In experiments, FLANP demonstrates significant speedups in wall-clock time -up to $6 \times $ – compared to standard federated learning benchmarks.

Jinho Choi

In this paper, we study communication-efficient distributed stochastic gradient descent (SGD) with data sets of users distributed over a certain area and communicating through wireless channels. Since the time for one iteration in the proposed approach is independent of the number of users, it is well-suited to scalable distributed SGD. Furthermore, since the proposed approach is based on preamble-based random access, which is widely adopted for machine-type communication (MTC), it can be easily employed for training models with a large number of devices where MTC is used for their connectivity. For fading channel, we show that noncoherent combining can be used. As a result, no channel state information (CSI) estimation is required. For over-the-air computation, the proposed approach takes advantage of random access, where the access probability is to encode the norm of local gradient vector (without using any additional bits) and the preamble is to encode the quantized normalized gradient vector. From analysis and simulation results, we can confirm that the proposed approach is not only scalable, but also provides improved performance as the number of devices increases. From analysis and simulation results, we can confirm that the proposed approach is not only scalable, but also provides improved performance as the number of devices increases.

Shubham K. Jha    Prathamesh Mayekar    Himanshu Tyagi

We consider over-the-air convex optimization on a $d$ dimensional space where coded gradients are sent over an additive Gaussian noise channel with variance $\sigma ^{2}$ . The codewords satisfy an average power constraint $P$ , resulting in the signal-to-noise ratio (SNR) of $P/\sigma ^{2}$ . We derive bounds for the convergence rates for over-the-air optimization. Our first result is a lower bound for the convergence rate showing that any code must slowdown the convergence rate by a factor of roughly $\sqrt {d/\log (1+ \mathtt {SNR})}$ . Next, we consider a popular class of schemes called analog coding, where a linear function of the gradient is sent. We show that a simple scaled transmission analog coding scheme results in a slowdown in convergence rate by a factor of $\sqrt {d(1+1/ \mathtt {SNR})}$ . This matches the previous lower bound up to constant factors for low SNR, making the scaled transmission scheme optimal at low SNR. However, we show that this slowdown is necessary for any analog coding scheme. In particular, a slowdown in convergence by a factor of $\sqrt {d}$ for analog coding remains even when SNR tends to infinity. Remarkably, we present a simple quantize-and-modulate scheme that uses Amplitude Shift Keying and almost attains the optimal convergence rate at all SNRs.

Animesh Sakorikar    Lele Wang

Gradient coding is a coding theoretic framework to provide robustness against slow or unresponsive machines, known as stragglers, in distributed machine learning applications. Recently, Kadhe et al. (2019) proposed a gradient code based on a combinatorial design, called balanced incomplete block design (BIBD), which is shown to outperform many existing gradient codes in worst-case adversarial straggling scenarios. However, parameters for which such BIBD constructions exist are very limited (Colbourn and Dinitz, 2006). In this paper, we aim to overcome such limitations and construct gradient codes which exist for a wide range of system parameters while retaining the superior performance of BIBD gradient codes. Two such constructions are proposed, one based on a probabilistic construction that relax the stringent BIBD gradient code constraints, and the other based on taking the Kronecker product of existing gradient codes. The proposed gradient codes allow flexible choices of system parameters while retaining comparable error performance.

Anindya Bijoy Das    Aditya Ramamoorthy

The overall execution time of distributed matrix computations is often dominated by slow worker nodes (stragglers) within the computation clusters. Recently, coding-theoretic techniques have been utilized to mitigate the effect of stragglers where worker nodes are assigned the job of processing encoded submatrices of the original matrices. In many machine learning or optimization problems the relevant matrices are often sparse. Several prior coded computation methods operate with dense linear combinations of the original submatrices; this can significantly increase the worker node computation times and consequently the overall job execution time. Moreover, several existing techniques treat the stragglers as failures (erasures) and discard their computations. In this work, we present a coding approach which operates with limited encoding of the original submatrices and utilizes the partial computations done by the slower workers. While our scheme can continue to have the optimal threshold of prior work, it also allows us to trade off the straggler resilience with the worker computation speed for sparse input matrices. Extensive numerical experiments done over cloud platforms confirm that the proposed approach enhances the speed of the worker computations (and thus the whole process) significantly.

Jinbao Zhu    Songze Li

We consider the problems of Private and Secure Matrix Multiplication (PSMM) and Fully Private Matrix Multiplication (FPMM), for which matrices privately selected by a master node are multiplied at distributed worker nodes without revealing the indices of the selected matrices, even when a certain number of workers collude with each other. We propose a novel systematic approach to solve PSMM and FPMM with colluding workers, which leverages solutions to a related Secure Matrix Multiplication (SMM) problem where the data (rather than the indices) of the multiplied matrices are kept private from colluding workers. Specifically, given an SMM strategy based on polynomial codes or Lagrange codes, one can exploit the special structure inspired by the matrix encoding function to design private coded queries for PSMM/FPMM, such that the algebraic structure of the computation result at each worker resembles that of the underlying SMM strategy. Adopting this systematic approach provides novel insights in private query designs for private matrix multiplication, substantially simplifying the processes of designing PSMM and FPMM strategies. Furthermore, the PSMM and FPMM strategies constructed following the proposed approach outperform the state-of-the-art strategies in one or more performance metrics including recovery threshold (minimal number of workers the master needs to wait for before correctly recovering the multiplication result), communication cost, and computation complexity, demonstrating a more flexible tradeoff in optimizing system efficiency.

Christoph Hofmeister    Rawad Bitar    Marvin Xhemrishi    Antonia Wachter-Zeh

We consider the problem of designing secure and private codes for distributed matrix-matrix multiplication. A master server owns two private matrices and hires worker nodes to help compute their product. The matrices should remain information-theoretically private from the workers. Some of the workers are malicious and return corrupted results to the master. We design a framework for security against malicious workers in private matrix-matrix multiplication. The main idea is a careful use of Freivalds’ algorithm to detect erroneous matrix multiplications. Our main goal is to apply this security framework to schemes with adaptive rates. Adaptive schemes divide the workers into clusters and thus provide flexibility in trading decoding complexity for efficiency. Our new scheme, SRPM3, provides a computationally efficient security check per cluster that detects the presence of one or more malicious workers with high probability. An additional per worker check is used to identify the malicious nodes. SRPM3 can tolerate the presence of an arbitrary number of malicious workers. We provide theoretical guarantees on the complexity of the security checks and simulation results on both, the missed detection rate as well as on the time needed for the integrity check.

Shahrzad Kiani    Stark C. Draper

Coded distributed computing was recently introduced to mitigate the effect of stragglers on distributed computing systems. This paper combines ideas of approximate and coded computing to further accelerate computation. We propose successive approximation coding (SAC) techniques that realize a tradeoff between accuracy and speed, allowing the distributed computing system to produce approximations that increase in accuracy over time. If a sufficient number of compute nodes finish their tasks, SAC exactly recovers the desired computation. We theoretically provide design guidelines for our SAC techniques, and numerically show that SAC achieves a better accuracy-speed tradeoff in comparison with previous methods.

Lev Tauz    Lara Dolecek

A majority of coded matrix-matrix computation literature has broadly focused in two directions: matrix partitioning for computing a single computation task and batch processing of multiple distinct computation tasks. While these works provide codes with good straggler resilience and fast decoding for their problem spaces, these codes would not be able to take advantage of the natural redundancy of re-using matrices across batch jobs. In this paper, we introduce the Variable Coded Distributed Batch Matrix Multiplication (VCDBMM) problem which tasks a distributed system to perform batch matrix multiplication where matrices are not necessarily distinct among batch jobs. Inspired in part by Cross-Subspace Alignment codes, we develop Flexible Cross-Subspace Alignments (FCSA) codes that are flexible enough to utilize this redundancy. We provide a full characterization of FCSA codes which allow for a wide variety of system complexities including good straggler resilience and fast decoding. We theoretically demonstrate that, under certain practical conditions, FCSA codes are within a factor of 2 of the optimal solution when it comes to straggler resilience. Furthermore, our simulations demonstrate that our codes can achieve even better optimality gaps in practice, even going as low as 1.7.

Yinbin Ma    Daniela Tuninetti

Coded caching aims to minimize the network’s peak-time communication load by leveraging the information pre-stored in the local caches at the users. The original setting by Maddah-Ali and Niesen, which considered single file retrieval, has been recently extended to general Scalar Linear Function Retrieval (SLFR) by Wan et al., who proposed a linear scheme that surprisingly achieves the same optimal load under the constraint of uncoded cache placement as in single file retrieval. This paper’s goal is to characterize the conditions under which a general SLFR linear scheme is optimal and gain insights into why the specific choices made by Wan et al. work. This paper shows that the optimal decoding coefficients are necessarily the product of two terms, one only involving the encoding coefficients and the other only the demands of the users. In addition, the algebraic relationships among the encoding coefficients of an optimal code are shown to be captured by the cycles of a universal graph. Thus, a general linear scheme for the SLFR problem can be found by solving a spanning tree problem for the universal graph. The proposed framework readily extends to caching-like problems, such as the problem of finding a general linear scheme for Sun et al.’s private function computation.

Ruida Zhou    Chao Tian    Hua Sun    James S. Plank

In the conventional robust $T$ -colluding private information retrieval (PIR) system, the user needs to retrieve one of the possible messages while keeping the identity of the requested message private from any $T$ colluding servers. Motivated by the possible heterogeneous privacy requirements for different messages, we consider the $(N, T_{1}~:~K_{1}, T_{2}~:~K_{2})$ two-level PIR system with a total of $K_{2}$ messages in the system, where $T_{1}\geq T_{2}$ and $K_{1}\leq K_{2}$ . Any one of the $K_{1}$ messages needs to be retrieved privately against $T_{1}$ colluding servers, and any one of the full set of $K_{2}$ messages needs to be retrieved privately against $T_{2}$ colluding servers. We obtain a lower bound to the capacity by proposing two novel coding schemes, namely the non-uniform successive cancelation scheme and the non-uniform block cancelation scheme. A capacity upper bound is also derived. The gap between the upper bound and the lower bounds is analyzed, and shown to vanish when $T_{1}=T_{2}$ . Lastly, we show that the upper bound is in general not tight by providing a stronger bound for a special setting.

Zhusheng Wang    Sennur Ulukus

We consider the problem of symmetric private information retrieval (SPIR) with user-side common randomness. In SPIR, a user retrieves a message out of $K$ messages from $N$ non-colluding and replicated databases in such a way that no single database knows the retrieved message index (user privacy), and the user gets to know nothing further than the retrieved message (database privacy), i.e., the privacy constraint between the user and the databases is symmetric. SPIR has the following three properties: its capacity is smaller than the capacity of PIR which requires only user privacy; it is infeasible in the case of a single database; and it requires presence of shared common randomness among the databases. We introduce a new variant of SPIR where the user is provided with a random subset of the shared database common randomness, which is unknown to the databases. We determine the exact capacity region of the triple $(d, \rho _{S}, \rho _{U})$ , where $d$ is the download cost, $\rho _{S}$ is the amount of shared database (server) common randomness, and $\rho _{U}$ is the amount of available user-side common randomness. We show that with a suitable amount of $\rho _{U}$ , this new SPIR achieves the capacity of the conventional PIR. As a corollary, single-database SPIR becomes feasible. Further, the presence of user-side $\rho _{U}$ reduces the amount of required server-side $\rho _{S}$ .

Shreya Gupta    Bh. Rekha Devi    V. Lalitha

In distributed storage systems, cooperative regenerating codes tradeoff storage for repair bandwidth in the case of multiple node failures. In rack-aware distributed storage systems, there is no cost associated with transferring symbols within a rack. Hence, the repair bandwidth will only take into account cross-rack transfer. Rack-aware regenerating codes for the case of single node failures have been studied and their repair bandwidth tradeoff characterized. In this paper, we consider the framework of rack-aware cooperative regenerating codes for the case of multiple node failures where the node failures are uniformly distributed among a certain number of racks. We characterize the storage repair-bandwidth tradeoff as well as derive the minimum storage and minimum repair bandwidth points of the tradeoff. We also provide explicit constructions of minimum bandwidth rack-aware cooperative regenerating codes and minimum storage rack-aware cooperative regenerating codes for a large range of parameters. We also consider another extension of minimum storage cooperative regenerating codes, in which we design slightly sub-optimal cooperative regenerating codes with much lower sub-packetization. $\epsilon $ -MSR codes are a class of codes introduced to tradeoff sub-packetization level for a slight increase in the repair bandwidth for the case of single node failures. We introduce the framework of $\epsilon $ -MSCR codes which allow for a similar tradeoff for the case of multiple node failures. We present a construction of $\epsilon $ -MSCR codes, which can recover from two node failures, by concatenating a class of MSCR codes and scalar linear codes. We give a repair procedure to repair the $\epsilon $ -MSCR codes in the event of two node failures and calculate the repair bandwidth for the same. We characterize the increase in repair bandwidth incurred by the method in comparison with the optimal repair bandwidth. Finally, we show the subpacketization level of $\epsilon $ -MSCR codes scales logarithmically in the number of nodes.

Hua Sun

The compound secure groupcast problem is considered, where the key variables at $K$ receivers are designed so that a transmitter can securely groupcast a message to any $N$ out of the $K$ receivers through a noiseless broadcast channel. The metric is the information theoretic tradeoff between key storage $\alpha $ , i.e., the number of bits of the key variable stored at each receiver per message bit, and broadcast bandwidth $\beta $ , i.e., the number of bits of the broadcast information sent by the transmitter per message bit. We have three main results. First, when broadcast bandwidth is minimized, i.e., when $\beta = 1$ , we show that the minimum key storage is $\alpha = N$ . Second, when key storage is minimized, i.e., when $\alpha = 1$ , we show that broadcast bandwidth $\beta = \min (N, K-N+1)$ is achievable and is optimal (minimum) if $N=2$ or $K-1$ . Third, when $N=2$ , the optimal key storage and broadcast bandwidth tradeoff is characterized as $\alpha +\beta \geq 3, \alpha \geq 1, \beta \geq 1$ .

Maiki Fujita    Takeshi Koshiba    Kenji Yasunaga

Secure Message Transmission (SMT) is a two-party cryptographic protocol by which the sender can securely and reliably transmit messages to the receiver using multiple channels. An adversary can corrupt a subset of the channels and commit eavesdropping and tampering attacks over the channels. In this work, we introduce a game-theoretic security model for SMT in which adversaries have some preferences for protocol execution. We define rational “timid” adversaries who prefer to violate security requirements but do not prefer the tampering to be detected. First, we consider the basic setting where a single adversary attacks the protocol. We construct perfect SMT protocols against any rational adversary corrupting all but one of the channels. Since minority corruption is required in the traditional setting, our results demonstrate a way of circumventing the cryptographic impossibility results by a game-theoretic approach. Next, we study the setting in which all the channels can be corrupted by multiple adversaries who do not cooperate. Since we cannot hope for any security if a single adversary corrupts all the channels or multiple adversaries cooperate maliciously, the scenario can arise from a game-theoretic model. We also study the scenario in which both malicious and rational adversaries exist.

Canran Wang    Netanel Raviv

Although blockchain, the supporting technology of various cryptocurrencies, has offered a potentially effective framework for numerous decentralized trust management systems, its performance is still sub-optimal in real-world networks. With limited bandwidth, the communication complexity for nodes to process a block scales with the growing network size and hence becomes the limiting factor of blockchain’s performance. In this paper, we suggest a re-design of existing blockchain systems, which addresses the issue of the communication burden. First, by employing techniques from Coded Computation, our scheme guarantees correct verification of transactions while reducing the bit complexity dramatically such that it grows logarithmically with the number of nodes. Second, with the adoption of techniques from Information Dispersal and State Machine Replication, the system is resilient to Byzantine faults and achieves linear message complexity. Third, we propose a novel 2-dimensional sharding strategy, which inherently supports cross-shard transactions, alleviating the need for complicated communication protocols between shards, while keeping the computation and storage benefits of sharding.

Amir Behrouzi-Far    Emina Soljanin

Distributed computing systems implement redundancy to reduce the job completion time and variability. Despite a large body of work about computing redundancy, the analytical performance evaluation of redundancy techniques in queuing systems is still an open problem. In this work, we take one step forward to analyze the performance of scheduling policies in systems with redundancy. In particular, we study the pattern of shared servers among replicas of different jobs. To this end, we employ combinatorics and graph theory and define and derive performance indicators using the statistics of the overlaps. We consider two classical nonadaptive scheduling policies: random and round-robin. We then propose a scheduling policy based on combinatorial block designs. Compared with conventional scheduling, the proposed scheduling improves the performance indicators. We study the expansion property of the graphs associated with round-robin and block design-based policies. It turns out the superior performance of the block design-based policy results from better expansion properties of its associated graph. As indicated by the performance indicators, the simulation results show that the block design-based policy outperforms random and round-robin scheduling in different scenarios. Specifically, it reduces the average waiting time in the queue to up to 25% compared to the random policy and up to 100% compared to the round-robin policy.