Â鶹´«Ã½AV

Submitted by admin on Mon, 10/28/2024 - 01:24

We consider distributed computation of a sequence of $J$ gradients $\{\mathbf {g}(0), \ldots,\mathbf {g}(J-1)\}$ . Each worker node computes a fraction of $\mathbf {g}(t)$ in round- $t$ and attempts to communicate the result to a master. Master is required to obtain the full gradient $\mathbf {g}(t)$ by the end of round- $(t+T)$ . The goal here is to finish all the $J$ gradient computations, keeping the cumulative processing time as short as possible. Delayed availability of results from individual workers causes bottlenecks in this setting. These delays can be due to factors such as processing delay of workers and packet losses. Gradient coding (GC) framework introduced by Tandon et al. uses coding theoretic techniques to mitigate the effect of delayed responses from workers. In this paper, we primarily target mitigating communication-level delays. In contrast to the classical GC approach which performs coding only across workers ( $T=0$ ), the proposed sequential gradient coding framework is more general, as it allows for coding across workers as well as time. We present a new sequential gradient coding scheme which offers improved resiliency against communication-level delays compared to the GC scheme, without increasing computational load. Our experimental results establish performance improvement offered by the new coding scheme.

M. Nikhil Krishnan
Erfan Hosseini
Ashish Khisti