Online optimization | Ricerc@Sapienza

Block layer decomposition schemes for training deep neural networks

Deep feedforward neural networks’ (DFNNs) weight estimation relies on the solution of a very large nonconvex optimization problem that may have many local (no global) minimizers, saddle points and large plateaus. Furthermore, the time needed to find good solutions of the training problem heavily depends on both the number of samples and the number of weights (variables). In this work, we show how block coordinate descent (BCD) methods can be fruitful applied to DFNN weight optimization problem and embedded in online frameworks possibly avoiding bad stationary points.

On the convergence of a Block-Coordinate Incremental Gradient method

In this paper, we study the convergence of a block-coordinate incremental gradient method. Under some specific assumptions on the objective function, we prove that the block-coordinate incremental gradient method can be seen as a gradient method with errors and convergence can be proved by showing the error at each iteration satisfies some standard conditions.