Information- Theoretic Methods for Trustworthy and Reliable Machine Learning

2024

Guest editors

Lalitha Sankar

Oliver Kosut

Alon Orlitsky

Flavio Calmon

Lele Wang

Ayfer Ozgur

Ofer Shayevitz

Over the past decade, machine learning (ML), that is the process of enabling computing systems to take data and churn out decisions, has been enabling tremendously exciting technologies. Such technologies can assist humans in making a variety of decisions by processing complex data to identify patterns, detect anomalies, and make inferences. At the same time, these automated decision-making systems raise questions about security and privacy of user data that drive ML, fairness of the decisions, and reliability of automated systems to make complex decisions that can affect humans in significant ways. In short, how can ML models be deployed in a responsible and trustworthy manner that ensures fair and reliable decision-making? This requires ensuring that the entire ML pipeline assures security, reliability, robustness, fairness, and privacy. Information theory can shed light on each of these challenges by providing a rigorous framework to not only quantify these desirata but also rigorously evaluate and provide assurances. From its beginnings, information theory has been devoted to a theoretical understanding of the limits of engineered systems. As such, it is a vital tool in guiding machine learning advances.

Lalitha Sankar Oliver Kosut Flavio Calmon Ayfer Ozgur Lele Wang Ofer Shayevitz Parastoo Sadeghi

Antonious M. Girgis Suhas Diggavi

We study the distributed mean estimation (DME) problem under privacy and communication constraints in the local differential privacy (LDP) and multi-message shuffled (MMS) privacy frameworks. The DME has wide applications in both federated learning and analytics. We propose a communication-efficient and differentially private algorithm for DME of bounded $\ell _{2}$ -norm and $\ell _{\infty }$ -norm vectors. We analyze our proposed DME schemes showing that our algorithms have order-optimal privacy-communication-performance trade-offs. Our algorithms are designed by giving unequal privacy assignments at different resolutions of the vector (through binary expansion) and appropriately combining it with coordinate sampling. These results are directly applied to give guarantees on private federated learning algorithms. We also numerically evaluate the performance of our private DME algorithms.

Anand Jerry George Lekshmi Ramesh Aditya Vikram Singh Himanshu Tyagi

We consider the problem of continually releasing an estimate of the population mean of a stream of samples that is user-level differentially private (DP). At each time instant, a user contributes a sample, and the users can arrive in arbitrary order. Until now these requirements of continual release and user-level privacy were considered in isolation. But, in practice, both these requirements come together as the users often contribute data repeatedly and multiple queries are made. We provide an algorithm that outputs a mean estimate at every time instant t such that the overall release is user-level $\varepsilon $ -DP and has the following error guarantee: Denoting by $m_{t}$ the maximum number of samples contributed by a user, as long as $\tilde {\Omega }(1/\varepsilon)$ users have $m_{t}/2$ samples each, the error at time t is $\tilde {O}(1/\sqrt {t}+\sqrt {m}_{t}/t\varepsilon)$ . This is a universal error guarantee which is valid for all arrival patterns of the users. Furthermore, it (almost) matches the existing lower bounds for the single-release setting at all time instants when users have contributed equal number of samples.

Matteo Zecchin Sangwoo Park Osvaldo Simeone

In many real-world problems, predictions are leveraged to monitor and control cyber-physical systems, demanding guarantees on the satisfaction of reliability and safety requirements. However, predictions are inherently uncertain, and managing prediction uncertainty presents significant challenges in environments characterized by complex dynamics and forking trajectories. In this work, we assume access to a pre-designed probabilistic implicit or explicit sequence model, which may have been obtained using model-based or model-free methods. We introduce probabilistic time series-conformal risk prediction (PTS-CRC), a novel post-hoc calibration procedure that operates on the predictions produced by any pre-designed probabilistic forecaster to yield reliable error bars. In contrast to existing art, PTS-CRC produces predictive sets based on an ensemble of multiple prototype trajectories sampled from the sequence model, supporting the efficient representation of forking uncertainties. Furthermore, unlike the state of the art, PTS-CRC can satisfy reliability definitions beyond coverage. This property is leveraged to devise a novel model predictive control (MPC) framework that addresses open-loop and closed-loop control problems under general average constraints on the quality or safety of the control policy. We experimentally validate the performance of PTS-CRC prediction and control by studying a number of use cases in the context of wireless networking. Across all the considered tasks, PTS-CRC predictors are shown to provide more informative predictive sets, as well as safe control policies with larger returns.

Chen Xu Jonghyeok Lee Xiuyuan Cheng Yao Xie

We present a computationally efficient framework, called FlowDRO, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets while aiming to find continuous worst-case distribution (also called the Least Favorable Distribution, LFD) and sample from it. The requirement for LFD to be continuous is so that the algorithm can be scalable to problems with larger sample sizes and achieve better generalization capability for the induced robust algorithms. To tackle the computationally challenging infinitely dimensional optimization problem, we leverage flow-based models and continuous-time invertible transport maps between the data distribution and the target distribution and develop a Wasserstein proximal gradient flow type algorithm. In theory, we establish the equivalence of the solution by optimal transport map to the original formulation, as well as the dual form of the problem through Wasserstein calculus and Brenier theorem. In practice, we parameterize the transport maps by a sequence of neural networks progressively trained in blocks by gradient descent. We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy, where the proposed method gives strong empirical performance on high-dimensional real data.

Zirui Yan Arpan Mukherjee Burak Varıcı Ali Tajer

The sequential design of experiments for optimizing a reward function in causal systems can be effectively modeled by the sequential design of interventions in causal bandits (CBs). In the existing literature on CBs, a critical assumption is that the causal models remain constant over time. However, this assumption does not necessarily hold in complex systems, which constantly undergo temporal model fluctuations. This paper addresses the robustness of CBs to such model fluctuations. The focus is on causal systems with linear structural equation models (SEMs). The SEMs and the time-varying pre- and post-interventional statistical models are all unknown. Cumulative regret is adopted as the design criteria, based on which the objective is to design a sequence of interventions that incur the smallest cumulative regret with respect to an oracle aware of the entire causal model and its fluctuations. First, it is established that the existing approaches fail to maintain regret sub-linearity with even a few instances of model deviation. Specifically, when the number of instances with model deviation is as few as $T^{\frac {1}{2L}}$ , where $T$ is the time horizon and $L$ is the length of the longest causal path in the graph, the existing algorithms will have linear regret in $T$ . For instance, when $T=10^{5}$ and $L=3$ , model deviations in 6 out of 105 instances result in a linear regret. Next, a robust CB algorithm is designed, and its regret is analyzed, where upper and information-theoretic lower bounds on the regret are established. Specifically, in a graph with $N$ nodes and maximum degree $d$ , under a general measure of model deviation $C$ , the cumulative regret is upper bounded by $\tilde {\mathcal {O}}\left({d^{L-{}\frac {1}{2}}(\sqrt {NT} + NC)}\right)$ and lower bounded by $\Omega \left({d^{\frac {L}{2}-2}\max \{\sqrt {T}\;, \; d^{2}C\}}\right)$ . Comparing these bounds establishes that the proposed algorithm achieves nearly optimal $\tilde{\mathcal {O}} (\sqrt {T})$ regret when $C$ is $o(\sqrt {T})$ and maintains sub-linear regret for a broader range of $C$ .

Ruida Zhou Chao Tian Tie Liu

We provide a new information-theoretic generalization error bound that is exactly tight (i.e., matching even the constant) for the canonical quadratic Gaussian (location) problem. Most existing bounds are order-wise loose in this setting, which has raised concerns about the fundamental capability of information-theoretic bounds in reasoning the generalization behavior for machine learning. The proposed new bound adopts the individual-sample-based approach proposed by Bu et al., but also has several key new ingredients. Firstly, instead of applying the change of measure inequality on the loss function, we apply it to the generalization error function itself; secondly, the bound is derived in a conditional manner; lastly, a reference distribution is introduced. The combination of these components produces a KL-divergence-based generalization error bound. We show that although the latter two new ingredients can help make the bound exactly tight, removing them does not significantly degrade the bound, leading to an asymptotically tight mutual-information-based bound. We further consider the vector Gaussian setting, where a direct application of the proposed bound again does not lead to tight bounds except in special cases. A refined bound is then proposed by a decomposition of loss functions, leading to a tight bound for the vector setting.

Alireza Sadeghi Gang Wang Georgios B. Giannakis

Successful training of data-intensive deep neural networks critically rely on vast, clean, and high-quality datasets. In practice however, their reliability diminishes, particularly with noisy, outlier-corrupted data samples encountered in testing. This challenge intensifies when dealing with anonymized, heterogeneous data sets stored across geographically distinct locations due to, e.g., privacy concerns. This present paper introduces robust learning frameworks tailored for centralized and federated learning scenarios. Our goal is to fortify model resilience with a focus that lies in (i) addressing distribution shifts from training to inference time; and, (ii) ensuring test-time robustness, when a trained model may encounter outliers or adversarially contaminated test data samples. To this aim, we start with a centralized setting where the true data distribution is considered unknown, but residing within a Wasserstein ball centered at the empirical distribution. We obtain robust models by minimizing the worst-case expected loss within this ball, yielding an intractable infinite-dimensional optimization problem. Upon leverage the strong duality condition, we arrive at a tractable surrogate learning problem. We develop two stochastic primal-dual algorithms to solve the resultant problem: one for $\epsilon $ -accurate convex sub-problems and another for a single gradient ascent step. We further develop a distributionally robust federated learning framework to learn robust model using heterogeneous data sets stored at distinct locations by solving per-learner’s sub-problems locally, offering robustness with modest computational overhead and considering data distribution. Numerical tests corroborate merits of our training algorithms against distributional uncertainties and adversarially corrupted test data samples.

Hyun-Young Park Seung-Hyun Nam Si-Hyeon Lee

In this paper, we propose a new class of local differential privacy (LDP) schemes based on combinatorial block designs for discrete distribution estimation. This class not only recovers many known LDP schemes in a unified framework of combinatorial block design, but also suggests a novel way of finding new schemes achieving the exactly optimal (or near-optimal) privacy-utility trade-off with lower communication costs. Indeed, we find many new LDP schemes that achieve the exactly optimal privacy-utility trade-off, with the minimum communication cost among all the unbiased or consistent schemes, for a certain set of input data size and LDP constraint. Furthermore, to partially solve the sparse existence issue of block design schemes, we consider a broader class of LDP schemes based on regular and pairwise-balanced designs, called RPBD schemes, which relax one of the symmetry requirements on block designs. By considering this broader class of RPBD schemes, we can find LDP schemes achieving near-optimal privacy-utility trade-off with reasonably low communication costs for a much larger set of input data size and LDP constraint.

Osama Hanna Antonious M. Girgis Christina Fragouli Suhas Diggavi

In this paper, we propose differentially private algorithms for the problem of stochastic linear bandits in the central, local and shuffled models. In the central model, we achieve almost the same regret as the optimal non-private algorithms, which means we get privacy for free. In particular, we achieve a regret of $\tilde {O}\left({\sqrt {T}+{}\frac {1}{\varepsilon }}\right)$ matching the known lower bound for private linear bandits, while the best previously known algorithm achieves $\tilde {O}\left({{}\frac {1}{\varepsilon }\sqrt {T}}\right)$ . In the local case, we achieve a regret of $\tilde {O}\left({{}\frac {1}{\varepsilon }{\sqrt {T}}}\right)$ which matches the non-private regret for constant $\varepsilon $ , but suffers a regret penalty when $\varepsilon $ is small. In the shuffled model, we also achieve regret of $\tilde {O}\left({\sqrt {T}+{}\frac {1}{\varepsilon }}\right)$ while the best previously known algorithm suffers a regret of $\tilde {O}\left({{}\frac {1}{\varepsilon }{T^{3/5}}}\right)$ . Our numerical evaluation validates our theoretical results. Our results generalize for contextual linear bandits with known context distributions.

Neophytos Charalambides Hessam Mahdavifar Mert Pilanci Alfred O. Hero

Linear regression is a fundamental and primitive problem in supervised machine learning, with applications ranging from epidemiology to finance. In this work, we propose methods for speeding up distributed linear regression. We do so by leveraging randomized techniques, while also ensuring security and straggler resiliency in asynchronous distributed computing systems. Specifically, we randomly rotate the basis of the system of equations and then subsample blocks, to simultaneously secure the information and reduce the dimension of the regression problem. In our setup, the basis rotation corresponds to an encoded encryption in an approximate gradient coding scheme, and the subsampling corresponds to the responses of the non-straggling servers in the centralized coded computing framework. This results in a distributive iterative stochastic approach for matrix compression and steepest descent.

Sahel Torkamani Javad B. Ebrahimi Parastoo Sadeghi Rafael G. L. D’Oliveira Muriel Médard

We present the notion of reasonable utility for binary mechanisms, which applies to all utility functions in the literature. This notion induces a partial ordering on the performance of all binary differentially private (DP) mechanisms. DP mechanisms that are maximal elements of this ordering are optimal DP mechanisms for every reasonable utility. By looking at differential privacy as a randomized graph coloring, we characterize these optimal DP in terms of their behavior on a certain subset of the boundary datasets we call a boundary hitting set. In the process of establishing our results, we also introduce a useful notion that generalizes DP conditions for binary-valued queries, which we coin as suitable pairs. Suitable pairs abstract away the algebraic roles of $\varepsilon ,\delta $ in the DP framework, making the derivations and understanding of our proofs simpler. Additionally, the notion of a suitable pair can potentially capture privacy conditions in frameworks other than DP and may be of independent interest.

Xinying Zou Samir M. Perlaza Iñaki Esnaola Eitan Altman H. Vincent Poor

The worst-case data-generating (WCDG) probability measure is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms. Such a WCDG probability measure is shown to be the unique solution to two different optimization problems: $(a)$ The maximization of the expected loss over the set of probability measures on the datasets whose relative entropy with respect to a reference measure is not larger than a given threshold; and $(b)$ The maximization of the expected loss with regularization by relative entropy with respect to the reference measure. Such a reference measure can be interpreted as a prior on the datasets. The WCDG cumulants are finite and bounded in terms of the cumulants of the reference measure. To analyze the concentration of the expected empirical risk induced by the WCDG probability measure, the notion of $(\epsilon, \delta )$ -robustness of models is introduced. Closed-form expressions are presented for the sensitivity of the expected loss for a fixed model. These results lead to a novel expression for the generalization error of arbitrary machine learning algorithms. This exact expression is provided in terms of the WCDG probability measure and leads to an upper bound that is equal to the sum of the mutual information and the lautum information between the models and the datasets, up to a constant factor. This upper bound is achieved by a Gibbs algorithm. This finding reveals that an exploration into the generalization error of the Gibbs algorithm facilitates the derivation of overarching insights applicable to any machine learning algorithm.

Songtao Feng Ming Yin Ruiquan Huang Yu-Xiang Wang Jing Yang Yingbin Liang

Function approximation has experienced significant success in the field of reinforcement learning (RL). Despite a handful of progress on developing theory for nonstationary RL with function approximation under structural assumptions, existing work for nonstationary RL with general function approximation is still limited. In this work, we investigate two different approaches for nonstationary RL with general function approximation: confidence-set based algorithm and UCB-type algorithm. For the first approach, we introduce a new complexity measure called dynamic Bellman Eluder (DBE) for nonstationary MDPs, and then propose a confidence-set based algorithm SW-OPEA based on the complexity metric. SW-OPEA features the sliding window mechanism and a novel confidence set design for nonstationary MDPs. For the second approach, we propose a UCB-type algorithm LSVI-Nonstationary following the popular least-square-value-iteration (LSVI) framework, and mitigate the computational efficiency challenge of the confidence-set based approach. LSVI-Nonstationary features the restart mechanism and a new design of the bonus term to handle nonstationarity. The two proposed algorithms outperform the existing algorithms for nonstationary linear and tabular MDPs in the small variation budget setting. To the best of our knowledge, the two approaches are the first confidence-set based algorithm and UCB-type algorithm in the context of nonstationary MDPs.

Elena Ghazi Ibrahim Issa

The framework of approximate differential privacy is considered, and augmented by leveraging the notion of “the total variation of a (privacy-preserving) mechanism” (denoted by $\eta $ -TV). With this refinement, an exact composition result is derived, and shown to be significantly tighter than the optimal bounds for differential privacy (which do not consider the total variation). Furthermore, it is shown that $(\varepsilon ,\delta )$ -DP with $\eta $ -TV is closed under subsampling. The induced total variation of commonly used mechanisms are computed. Moreover, the notion of total variation of a mechanism is studied in the local privacy setting and privacy-utility tradeoffs are investigated. In particular, total variation distance and KL divergence are considered as utility functions and studied through the lens of contraction coefficients. Finally, the results are compared and connected to the locally differentially private setting.

Daria Reshetova Wei-Ning Chen Ayfer Özgür

Local differential privacy is a powerful method for privacy-preserving data collection. In this paper, we develop a framework for training Generative Adversarial Networks (GANs) on differentially privatized data. We show that entropic regularization of optimal transport – a popular regularization method in the literature that has often been leveraged for its computational benefits – enables the generator to learn the raw (unprivatized) data distribution even though it only has access to privatized samples. We prove that at the same time this leads to fast statistical convergence at the parametric rate. This shows that entropic regularization of optimal transport uniquely enables the mitigation of both the effects of privatization noise and the curse of dimensionality in statistical convergence. We provide experimental evidence to support the efficacy of our framework in practice.

Mark Beliaev Payam Delgosha Hamed Hassani Ramtin Pedarsani

Over the past two decades, the rise in adoption of neural networks has surged in parallel with their performance. Concurrently, we have observed the inherent fragility of these prediction models: small changes to the inputs can induce classification errors across entire datasets. In the following study, we examine perturbations constrained by the $\ell _{0}$ –norm, a potent attack model in the domains of computer vision, malware detection, and natural language processing. To combat this adversary, we introduce a novel defense technique comprised of two components: “truncation” and “adversarial training”. Subsequently, we conduct a theoretical analysis of the Gaussian mixture setting and establish the asymptotic optimality of our proposed defense. Based on this obtained insight, we broaden the application of our technique to neural networks. Lastly, we empirically validate our results in the domain of computer vision, demonstrating substantial enhancements in the robust classification error of neural networks.

Gholamali Aminian Saeed Masiha Laura Toni Miguel R. D. Rodrigues

Generalization error bounds are essential for comprehending how well machine learning models work. In this work, we suggest a novel method, i.e., the Auxiliary Distribution Method, that leads to new upper bounds on expected generalization errors that are appropriate for supervised learning scenarios. We show that our general upper bounds can be specialized under some conditions to new bounds involving the $\alpha $ -Jensen-Shannon, $\alpha $ -Rényi $(0\lt \alpha \lt 1)$ information between a random variable modeling the set of training samples and another random variable modeling the set of hypotheses. Our upper bounds based on $\alpha $ -Jensen-Shannon information are also finite. Additionally, we demonstrate how our auxiliary distribution method can be used to derive the upper bounds on excess risk of some learning algorithms in the supervised learning context and the generalization error under the distribution mismatch scenario in supervised learning algorithms, where the distribution mismatch is modeled as $\alpha $ -Jensen-Shannon or $\alpha $ -Rényi divergence between the distribution of test and training data samples distributions. We also outline the conditions for which our proposed upper bounds might be tighter than other earlier upper bounds.

Baturalp Buyukates Jinhyun So Hessam Mahdavifar Salman Avestimehr

Secure aggregation protects the local models of the users in federated learning, by not allowing the server to obtain any information beyond the aggregate model at each iteration. Naively implementing secure aggregation fails to protect the integrity of the aggregate model in the possible presence of a malicious server forging the aggregation result, which motivates verifiable aggregation in federated learning. Existing verifiable aggregation schemes either have a linear complexity in model size or require time-consuming reconstruction at the server, that is quadratic in the number of users, in case of likely user dropouts. To overcome these limitations, we propose LightVeriFL, a lightweight and communication-efficient secure verifiable aggregation protocol, that provides the same guarantees for verifiability against a malicious server, data privacy, and dropout-resilience as the state-of-the-art protocols without incurring substantial communication and computation overheads. The proposed LightVeriFL protocol utilizes homomorphic hash and commitment functions of constant length, that are independent of the model size, to enable verification at the users. In case of dropouts, LightVeriFL uses a one-shot aggregate hash recovery of the dropped-out users, instead of a one-by-one recovery, making the verification process significantly faster than the existing approaches. Comprehensive experiments show the advantage of LightVeriFL in practical settings.

Banghua Zhu Ziao Wang Nadim Ghaddar Jiantao Jiao Lele Wang

We consider the problem of computing a function of n variables using noisy queries, where each query is incorrect with some fixed and known probability $p \in (0,1/2)$ . Specifically, we consider the computation of the $\textsf {OR}$ function of n bits (where queries correspond to noisy readings of the bits) and the $\textsf {MAX}$ function of n real numbers (where queries correspond to noisy pairwise comparisons). We show that an expected number of queries of $(1 \pm o(1)) {}\frac {n\log {}\frac {1}{\delta }}{D_{\textsf {KL}}(p \| 1-p)}$ is both sufficient and necessary to compute both functions with a vanishing error probability $\delta = o(1)$ , where $D_{\textsf {KL}}(p \| 1-p)$ denotes the Kullback-Leibler divergence between $\textsf {Bern}(p)$ and $\textsf {Bern}(1-p)$ distributions. Compared to previous work, our results tighten the dependence on p in both the upper and lower bounds for the two functions.

Mohamed Seif Yanxi Chen Andrea J. Goldsmith H. Vincent Poor

We study the community detection problem over binary symmetric stochastic block models (SBMs) while preserving the privacy of the individual connections between the vertices. We present and analyze the associated information-theoretic tradeoff for differentially private exact recovery of the underlying communities by deriving sufficient separation conditions between the intra-community and inter-community connection probabilities while taking into account the privacy budget and graph sketching as a speed-up technique to improve the computational complexity of maximum likelihood (ML) based recovery algorithms.

Ruizhi Zhang

Detection of sparse signals arises in many modern applications such as signal processing, bioinformatics, finance, and disease surveillance. However, in many of these applications, the data may contain sensitive personal information, which is desirable to be protected during the data analysis. In this article, we consider the problem of $(\epsilon,\delta)$ -differentially private detection of a general sparse mixture with a focus on how privacy affects the detection power. By investigating the nonasymptotic upper bound for the summation of error probabilities, we find any $(\epsilon,\delta)$ -differentially private test cannot detect the sparse signal if the privacy constraint is too strong or if the model parameters are in the undetectable region (Cai and Wu, 2014). Moreover, we study the private clamped log-likelihood ratio test proposed by Canonne et al., 2019 and show it achieves vanishing error probabilities in some conditions on the model parameters and privacy parameters. Then, for the case when the null distribution is a standard normal distribution, we propose an adaptive $(\epsilon,\delta)$ -differentially private test, which achieves vanishing error probabilities in the same detectable region (Cai and Wu, 2014) when the privacy parameters satisfy certain sufficient conditions. Several numerical experiments are conducted to verify our theoretical results and illustrate the performance of our proposed test.

Faisal Hamman Erfaun Noorani Saumitra Mishra Daniele Magazzeni Sanghamitra Dutta

There is an emerging interest in generating robust algorithmic recourse that would remain valid if the model is updated or changed even slightly. Towards finding robust algorithmic recourse (or counterfactual explanations), existing literature often assumes that the original model m and the new model M are bounded in the parameter space, i.e., $\|\text {Params}(M){-}\text {Params}(m)\|{\lt }\Delta $ . However, models can often change significantly in the parameter space with little to no change in their predictions or accuracy on the given dataset. In this work, we introduce a mathematical abstraction termed naturally-occurring model change, which allows for arbitrary changes in the parameter space such that the change in predictions on points that lie on the data manifold is limited. Next, we propose a measure – that we call Stability – to quantify the robustness of counterfactuals to potential model changes for differentiable models, e.g., neural networks. Our main contribution is to show that counterfactuals with sufficiently high value of Stability as defined by our measure will remain valid after potential “naturally-occurring” model changes with high probability (leveraging concentration bounds for Lipschitz function of independent Gaussians). Since our quantification depends on the local Lipschitz constant around a data point which is not always available, we also examine estimators of our proposed measure and derive a fundamental lower bound on the sample size required to have a precise estimate. We explore methods of using stability measures to generate robust counterfactuals that are close, realistic, and remain valid after potential model changes. This work also has interesting connections with model multiplicity, also known as the Rashomon effect.

Zinan Lin Shuaiqi Wang Vyas Sekar Giulia Fanti

We study a setting where a data holder wishes to share data with a receiver, without revealing certain summary statistics of the data distribution (e.g., mean, standard deviation). It achieves this by passing the data through a randomization mechanism. We propose summary statistic privacy, a metric for quantifying the privacy risk of such a mechanism based on the worst-case probability of an adversary guessing the distributional secret within some threshold. Defining distortion as a worst-case Wasserstein-1 distance between the real and released data, we prove lower bounds on the tradeoff between privacy and distortion. We then propose a class of quantization mechanisms that can be adapted to different data distributions. We show that the quantization mechanism’s privacy-distortion tradeoff matches our lower bounds under certain regimes, up to small constant factors. Finally, we demonstrate on real-world datasets that the proposed quantization mechanisms achieve better privacy-distortion tradeoffs than alternative privacy mechanisms.

Shahab Asoodeh Huanyu Zhang

We investigate the contraction properties of locally differentially private mechanisms. More specifically, we derive tight upper bounds on the divergence between $P{\mathsf K}$ and $Q{\mathsf K}$ output distributions of an $\varepsilon $ -LDP mechanism $\mathsf K$ in terms of a divergence between the corresponding input distributions P and Q, respectively. Our first main technical result presents a sharp upper bound on the $\chi ^{2}$ -divergence $\chi ^{2}(P{\mathsf K}\|Q{\mathsf K})$ in terms of $\chi ^{2}(P\|Q)$ and $\varepsilon $ . We also show that the same result holds for a large family of divergences, including KL-divergence and squared Hellinger distance. The second main technical result gives an upper bound on $\chi ^{2}(P{\mathsf K}\|Q{\mathsf K})$ in terms of total variation distance ${\textsf {TV}}(P, Q)$ and $\varepsilon $ . We then utilize these bounds to establish locally private versions of the van Trees inequality, Le Cam’s, Assouad’s, and the mutual information methods —powerful tools for bounding minimax estimation risks. These results are shown to lead to tighter privacy analyses than the state-of-the-arts in several statistical problems such as entropy and discrete distribution estimation, non-parametric density estimation, and hypothesis testing.

Yauhen Yakimenka Chung-Wei Weng Hsuan-Yin Lin Eirik Rosnes Jörg Kliewer

We consider the straggler problem in decentralized learning over a logical ring while preserving user data privacy. Especially, we extend the recently proposed framework of differential privacy (DP) amplification by decentralization by Cyffers and Bellet to include overall training latency—comprising both computation and communication latency. Analytical results on both the convergence speed and the DP level are derived for both a skipping scheme (which ignores the stragglers after a timeout) and a baseline scheme that waits for each node to finish before the training continues. A trade-off between overall training latency, accuracy, and privacy, parameterized by the timeout of the skipping scheme, is identified and empirically validated for logistic regression on a real-world dataset and for image classification using the MNIST and CIFAR-10 datasets.

Bhagyashree Puranik Ozgur Guldogan Upamanyu Madhow Ramtin Pedarsani

While much of the rapidly growing literature on fair decision-making focuses on metrics for one-shot decisions, recent work has raised the intriguing possibility of designing sequential decision-making to positively impact long-term social fairness. In selection processes such as college admissions or hiring, biasing slightly towards applicants from under-represented groups is hypothesized to provide positive feedback that increases the pool of under-represented applicants in future selection rounds, thus enhancing fairness in the long term. In this paper, we examine this hypothesis and its consequences in a setting in which multiple agents are selecting from a common pool of applicants. We propose the Multi-agent Fair-Greedy policy, that balances greedy score maximization and fairness. Under this policy, we prove that the resource pool and the admissions converge to a long-term fairness target set by the agents when the score distributions across the groups in the population are identical. We provide empirical evidence of existence of equilibria under non-identical score distributions through synthetic and adapted real-world datasets. We then sound a cautionary note for more complex applicant pool evolution models, under which uncoordinated behavior by the agents can cause negative reinforcement, leading to a reduction in the fraction of under-represented applicants. Our results indicate that, while positive reinforcement is a promising mechanism for long-term fairness, policies must be designed carefully to be robust to variations in the evolution model, with a number of open issues that remain to be explored by algorithm designers, social scientists, and policymakers.

Shahrzad Kiani Franziska Boenisch Stark C. Draper

Federated Learning (FL) is the standard protocol for collaborative learning. In FL, multiple workers jointly train a shared model. They exchange model updates calculated on their data, while keeping the raw data itself local. Since workers naturally form groups based on common interests and privacy policies, we are motivated to extend standard FL to reflect a setting with multiple, potentially overlapping groups. In this setup where workers can belong and contribute to more than one group at a time, complexities arise in understanding privacy leakage and in adhering to privacy policies. To address the challenges, we propose differential private overlapping grouped learning (DP-OGL), a novel method to implement privacy guarantees within overlapping groups. Under the honest-but-curious threat model, we derive novel privacy guarantees between arbitrary pairs of workers. These privacy guarantees describe and quantify two key effects of privacy leakage in DP-OGL: propagation delay, i.e., the fact that information from one group will leak to other groups only with temporal offset through the common workers and information degradation, i.e., the fact that noise addition over model updates limits information leakage between workers. Our experiments show that applying DP-OGL enhances utility while maintaining strong privacy compared to standard FL setups.

Monica Welfert Gowtham R. Kurri Kyle Otstot Lalitha Sankar

Generative adversarial networks (GANs), modeled as a zero-sum game between a generator (G) and a discriminator (D), allow generating synthetic data with formal guarantees. Noting that D is a classifier, we begin by reformulating the GAN value function using class probability estimation (CPE) losses. We prove a two-way correspondence between CPE loss GANs and f-GANs which minimize f-divergences. We also show that all symmetric f-divergences are equivalent in convergence. In the finite sample and model capacity setting, we define and obtain bounds on estimation and generalization errors. We specialize these results to $\alpha $ -GANs, defined using $\alpha $ -loss, a tunable CPE loss family parametrized by $\alpha \in (0,\infty $ ]. We next introduce a class of dual-objective GANs to address training instabilities of GANs by modeling each player’s objective using $\alpha $ -loss to obtain $(\alpha _{D},\alpha _{G})$ -GANs. We show that the resulting non-zero sum game simplifies to minimizing an f-divergence under appropriate conditions on $(\alpha _{D},\alpha _{G})$ . Generalizing this dual-objective formulation using CPE losses, we define and obtain upper bounds on an appropriately defined estimation error. Finally, we highlight the value of tuning $(\alpha _{D},\alpha _{G})$ in alleviating training instabilities for the synthetic 2D Gaussian mixture ring as well as the large publicly available Celeb-A and LSUN Classroom image datasets.

Baturalp Buyukates Jinhyun So Hessam Mahdavifar Salman Avestimehr

This article addresses errors in [1]. Equation (2) contained an error wherein x was not bold. It is corrected below.

Tomer Berg Or Ordentlich Ofer Shayevitz

The problem of statistical inference in its various forms has been the subject of decades-long extensive research. Most of the effort has been focused on characterizing the behavior as a function of the number of available samples, with far less attention given to the effect of memory limitations on performance. Recently, this latter topic has drawn much interest in the engineering and computer science literature. In this survey paper, we attempt to review the state-of-the-art of statistical inference under memory constraints in several canonical problems, including hypothesis testing, parameter estimation, and distribution property testing/estimation. We discuss the main results in this developing field, and by identifying recurrent themes, we extract some fundamental building blocks for algorithmic construction, as well as useful techniques for lower bound derivations.

Lucas Monteiro Paes Ananda Theertha Suresh Alex Beutel Flavio P. Calmon Ahmad Beirami

Machine learning (ML) models used in prediction and classification tasks may display performance disparities across population groups determined by sensitive attributes (e.g., race, sex, age). We consider the problem of evaluating the performance of a fixed ML model across population groups defined by multiple sensitive attributes (e.g., race and sex and age). Here, the sample complexity for estimating the worst-case performance gap across groups (e.g., the largest difference in error rates) increases exponentially with the number of group-denoting sensitive attributes. To address this issue, we propose an approach to test for performance disparities based on Conditional Value-at-Risk (CVaR). By allowing a small probabilistic slack on the groups over which a model has approximately equal performance, we show that the sample complexity required for discovering performance violations is reduced exponentially to be at most upper bounded by the square root of the number of groups. As a byproduct of our analysis, when the groups are weighted by a specific prior distribution, we show that Rényi entropy of order 2/3 of the prior distribution captures the sample complexity of the proposed CVaR test algorithm. Finally, we also show that there exists a non-i.i.d. data collection strategy that results in a sample complexity independent of the number of groups.

�鶹��ýAV

Table of Contents