Â鶹´«Ã½AV

Minimax Estimation of Divergences Between Discrete Distributions

Submitted by admin on Mon, 10/28/2024 - 01:24

We study the minimax estimation of α-divergences between discrete distributions for integer α ≥ 1, which include the Kullback-Leibler divergence and the χ2-divergences as special examples. Dropping the usual theoretical tricks to acquire independence, we construct the first minimax rate-optimal estimator which does not require any Poissonization, sample splitting, or explicit construction of approximating polynomials.

A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting

Submitted by admin on Mon, 10/28/2024 - 01:24

We consider a finite-armed structured bandit problem in which mean rewards of different arms are known functions of a common hidden parameter $\theta ^{*}$ . Since we do not place any restrictions on these functions, the problem setting subsumes several previously studied frameworks that assume linear or invertible reward functions. We propose a novel approach to gradually estimate the hidden $\theta ^{*}$ and use the estimate together with the mean reward functions to substantially reduce exploration of sub-optimal arms.

rTop-k: A Statistical Estimation Approach to Distributed SGD

Submitted by admin on Mon, 10/28/2024 - 01:24

The large communication cost for exchanging gradients between different nodes significantly limits the scalability of distributed training for large-scale learning models. Motivated by this observation, there has been significant recent interest in techniques that reduce the communication cost of distributed Stochastic Gradient Descent (SGD), with gradient sparsification techniques such as top-k and random-k shown to be particularly effective.

Generalized Autoregressive Linear Models for Discrete High-Dimensional Data

Submitted by admin on Mon, 10/28/2024 - 01:24

Fitting multivariate autoregressive (AR) models is fundamental for time-series data analysis in a wide range of applications. This article considers the problem of learning a $p$ -lag multivariate AR model where each time step involves a linear combination of the past $p$ states followed by a probabilistic, possibly nonlinear, mapping to the next state. The problem is to learn the linear connectivity tensor from observations of the states. We focus on the sparse setting, which arises in applications with a limited number of direct connections between variables.

Fast Variational Inference for Joint Mixed Sparse Graphical Models

Submitted by admin on Mon, 10/28/2024 - 01:24

Mixed graphical models are widely implemented to capture interactions among different types of variables. To simultaneously learn the topology of multiple mixed graphical models and encourage common structure, people have developed a variational maximum likelihood inference approach, which takes advantage of the log-determinant relaxation. In this article, we further improve the computational efficiency of this method by exploiting the block diagonal structure of the solution.

Generalization Bounds via Information Density and Conditional Information Density

Submitted by admin on Mon, 10/28/2024 - 01:24

We present a general approach, based on an exponential inequality, to derive bounds on the generalization error of randomized learning algorithms. Using this approach, we provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios. Specifically, for the case of sub-Gaussian loss functions, we obtain novel bounds that depend on the information density between the training data and the output hypothesis.

Global Multiclass Classification and Dataset Construction via Heterogeneous Local Experts

Submitted by admin on Mon, 10/28/2024 - 01:24

In the domains of dataset construction and crowdsourcing, a notable challenge is to aggregate labels from a heterogeneous set of labelers, each of whom is potentially an expert in some subset of tasks (and less reliable in others). To reduce costs of hiring human labelers or training automated labeling systems, it is of interest to minimize the number of labelers while ensuring the reliability of the resulting dataset.

Recovering Data Permutations From Noisy Observations: The Linear Regime

Submitted by admin on Mon, 10/28/2024 - 01:24

This article considers a noisy data structure recovery problem. The goal is to investigate the following question: given a noisy observation of a permuted data set, according to which permutation was the original data sorted? The focus is on scenarios where data is generated according to an isotropic Gaussian distribution, and the noise is additive Gaussian with an arbitrary covariance matrix. This problem is posed within a hypothesis testing framework.