鶹ýAV

Bayesian Algorithms for Decentralized Stochastic Bandits

Submitted by admin on Mon, 10/28/2024 - 01:24

We study a decentralized cooperative multi-agent multi-armed bandit (MAB) problem with K arms and N agents connected over a network. In this model, each arm's reward distribution is the same for every agent, and rewards are drawn independently across agents and over time steps. At each iteration, agents independently choose an arm to play and exchange at most poly(K) real-valued messages with their neighbors.

Nonparametric Iterated-Logarithm Extensions of the Sequential Generalized Likelihood Ratio Test

Submitted by admin on Mon, 10/28/2024 - 01:24

We develop a nonparametric extension of the sequential generalized likelihood ratio (GLR) test and corresponding time-uniform confidence sequences for the mean of a univariate distribution. By utilizing a geometric interpretation of the GLR statistic, we derive a simple analytic upper bound on the probability that it exceeds any prespecified boundary; these are intractable to approximate via simulations due to infinite horizon of the tests and the composite nonparametric nulls under consideration.

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

Submitted by admin on Mon, 10/28/2024 - 01:24

We study the estimation of risk-sensitive policies in reinforcement learning (RL) problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite. Prior efforts are predominately afflicted by computational challenges associated with the fact that risk-sensitive MDPs are time-inconsistent, and hence modifications of Bellman's equations are required, which often do not permit efficient fixed point schemes.

Asynchronous Decentralized Accelerated Stochastic Gradient Descent

Submitted by admin on Mon, 10/28/2024 - 01:24

In this paper, we introduce an asynchronous decentralized accelerated stochastic gradient descent type of algorithm for decentralized stochastic optimization. Considering communication and synchronization costs are the major bottlenecks for decentralized optimization, we attempt to reduce these costs from an algorithmic design aspect, in particular, we are able to reduce the number of agents involved in one round of update via randomization.

Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal Agent

Submitted by admin on Mon, 10/28/2024 - 01:24

Reinforcement learners are agents that learn to pick actions that lead to high reward. Ideally, the value of a reinforcement learner’s policy approaches optimality—where the optimal informed policy is the one which maximizes reward. Unfortunately, we show that if an agent is guaranteed to be “asymptotically optimal” in any (stochastically computable) environment, then subject to an assumption about the true environment, this agent will be either “destroyed” or “incapacitated” with probability 1.

Asynchronous Delayed Optimization With Time-Varying Minibatches

Submitted by admin on Mon, 10/28/2024 - 01:24

Large-scale learning and optimization problems are often solved in parallel. In a master-worker distributed setup, worker nodes are most often assigned fixed-sized minibatches of data points to process. However, workers may take different amounts of time to complete their per-batch calculations. To deal with such variability in processing times, an alternative approach has recently been proposed wherein each worker is assigned a fixed duration to complete the calculations associated with each batch.

Robust Change Detection via Information Projection

Submitted by admin on Mon, 10/28/2024 - 01:24

We study the robust transient and quickest change detection problems with unknown post-change distributions under finite alphabets. When the distribution after the change point is unknown, the change in the distribution of observations may occur in multiple ways without much structure on the observations, whereas, before the change point, a false alarm is highly structured, following a particular sample path with respect to the distribution of observations and the detection scheme.

Bandit-Based Monte Carlo Optimization for Nearest Neighbors

Submitted by admin on Mon, 10/28/2024 - 01:24

The celebrated Monte Carlo method estimates an expensive-to-compute quantity by random sampling. Bandit-based Monte Carlo optimization is a general technique for computing the minimum of many such expensive-to-compute quantities by adaptive random sampling. The technique converts an optimization problem into a statistical estimation problem which is then solved via multi-armed bandits.

On No-Sensing Adversarial Multi-Player Multi-Armed Bandits With Collision Communications

Submitted by admin on Mon, 10/28/2024 - 01:24

We study the notoriously difficult no-sensing adversarial multi-player multi-armed bandits (MP-MAB) problem from a new perspective. Instead of focusing on the hardness of multiple players, we introduce a new dimension of hardness, called attackability. All adversaries can be categorized based on the attackability and we introduce Adversary-Adaptive Collision-Communication (A2C2), a family of algorithms with forced-collision communication among players.

Quickest Detection of Moving Anomalies in Sensor Networks

Submitted by admin on Mon, 10/28/2024 - 01:24

The problem of sequentially detecting a moving anomaly is studied, in which the anomaly affects different parts of a sensor network over time. Each network sensor is characterized by a pre- and post-change distribution. Initially, the observations of each sensor are generated according to the corresponding pre-change distribution. After some unknown but deterministic time instant, a moving anomaly emerges, affecting different sets of sensors as time progresses.