We benchmark improvements in the performance of deep neural networks (DNN) on the MNIST data test upon imple-menting two simple modifications to the algorithm that have little overhead computational cost. First is GPU parallelization on a commodity graphics card, and second is initializing the DNN with random orthogonal weight matrices prior to optimization. Eigenspectra analysis of the weight matrices reveal that the initially orthogonal matrices remain nearly orthogonal after training. The probability distributions from which these orthogonal matrices are drawn are also shown to significantly affect the performance of these deep neural networks.

1.
A. M.
Saxe
,
J. L.
McClelland
, and
S.
Ganguli
, “
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
,” in
Proceedings of the International Conference on Learning Representations
, edited by
Y.
Bengio
and
Y.
LeCun
(
Banff
,
Canada
,
2014
), arXiv:1312.6120.
2.
F.
Chollet
,
Keras
,
Source code
available from https://github.com/fchollet/keras. Retrieved March
2017
.
3.
Y.
LeCun
,
C.
Cortes
, and
C. J. C.
Burges
,
The MNIST database of handwritten digits
, Data set available from http://yann.lecun.com/exdb/mnist/. Retrieved March 2017.
This content is only available via PDF.