Â鶹´«Ã½AV

Harmless Interpolation of Noisy Data in Regression

Submitted by admin on Wed, 10/23/2024 - 01:52

A continuing mystery in understanding the empirical success of deep neural networks is their ability to achieve zero training error and generalize well, even when the training data is noisy and there are more parameters than data points. We investigate this overparameterized regime in linear regression, where all solutions that minimize training error interpolate the data, including noise.

Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations

Submitted by admin on Wed, 10/23/2024 - 01:52

Communication bottleneck has been identified as a significant issue in distributed optimization of large-scale learning models. Recently, several approaches to mitigate this problem have been proposed, including different forms of gradient compression or computing local models and mixing them iteratively. In this paper, we propose Qsparse-local-SGD algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference between the true and compressed gradients.

On Distributed Quantization for Classification

Submitted by admin on Wed, 10/23/2024 - 01:52

We consider the problem of distributed feature quantization, where the goal is to enable a pretrained classifier at a central node to carry out its classification on features that are gathered from distributed nodes through communication constrained channels. We propose the design of distributed quantization schemes specifically tailored to the classification task: unlike quantization schemes that help the central node reconstruct the original signal as accurately as possible, our focus is not reconstruction accuracy, but instead correct classification.

Inference With Deep Generative Priors in High Dimensions

Submitted by admin on Wed, 10/23/2024 - 01:52

Deep generative priors offer powerful models for complex-structured data, such as images, audio, and text. Using these priors in inverse problems typically requires estimating the input and/or hidden signals in a multi-layer deep neural network from observation of its output. While these approaches have been successful in practice, rigorous performance analysis is complicated by the non-convex nature of the underlying optimization problems.

Deepcode: Feedback Codes via Deep Learning

Submitted by admin on Wed, 10/23/2024 - 01:52

The design of codes for communicating reliably over a statistically well defined channel is an important endeavor involving deep mathematical research and wide-ranging practical applications. In this work, we present the first family of codes obtained via deep learning, which significantly outperforms state-of-the-art codes designed over several decades of research.

DeepJSCC-f: Deep Joint Source-Channel Coding of Images With Feedback

Submitted by admin on Wed, 10/23/2024 - 01:52

We consider wireless transmission of images in the presence of channel output feedback. From a Shannon theoretic perspective feedback does not improve the asymptotic end-to-end performance, and separate source coding followed by capacity-achieving channel coding, which ignores the feedback signal, achieves the optimal performance.

LEARN Codes: Inventing Low-Latency Codes via Recurrent Neural Networks

Submitted by admin on Wed, 10/23/2024 - 01:52

Designing channel codes under low-latency constraints is one of the most demanding requirements in 5G standards. However, a sharp characterization of the performance of traditional codes is available only in the large block-length limit. Guided by such asymptotic analysis, code designs require large block lengths as well as latency to achieve the desired error rate.

Tightening Mutual Information-Based Bounds on Generalization Error

Submitted by admin on Wed, 10/23/2024 - 01:52

An information-theoretic upper bound on the generalization error of supervised learning algorithms is derived. The bound is constructed in terms of the mutual information between each individual training sample and the output of the learning algorithm. The bound is derived under more general conditions on the loss function than in existing studies; nevertheless, it provides a tighter characterization of the generalization error.

Toward Moderate Overparameterization: Global Convergence Guarantees for Training Shallow Neural Networks

Submitted by admin on Wed, 10/23/2024 - 01:52

Many modern neural network architectures are trained in an overparameterized regime where the parameters of the model exceed the size of the training dataset. Sufficiently overparameterized neural network architectures in principle have the capacity to fit any set of labels including random noise. However, given the highly nonconvex nature of the training landscape it is not clear what level and kind of overparameterization is required for first order methods to converge to a global optima that perfectly interpolate any labels.

Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning

Submitted by admin on Wed, 10/23/2024 - 01:52

We consider distributed gradient descent in the presence of stragglers. Recent work on gradient coding and approximate gradient coding have shown how to add redundancy in distributed gradient descent to guarantee convergence even if some workers are stragglers-that is, slow or non-responsive. In this work we propose an approximate gradient coding scheme called Stochastic Gradient Coding (SGC), which works when the stragglers are random.