This paper deals with the distribution of singular values of the input–output Jacobian of deep untrained neural networks in the limit of their infinite width. The Jacobian is the product of random matrices where the independent weight matrices alternate with diagonal matrices whose entries depend on the corresponding column of the nearest neighbor weight matrix. The problem has been considered in the several recent studies of the field for the Gaussian weights and biases and also for the weights that are Haar distributed orthogonal matrices and Gaussian biases. Based on a free probability argument, it was claimed in those papers that, in the limit of infinite width (matrix size), the singular value distribution of the Jacobian coincides with that of the analog of the Jacobian with special random but weight independent diagonal matrices, the case well known in random matrix theory. In this paper, we justify the claim for random Haar distributed weight matrices and Gaussian biases. This, in particular, justifies the validity of the mean field approximation in the infinite width limit for the deep untrained neural networks and extends the macroscopic universality of random matrix theory to this new class of random matrices.

1.
Y.
Bahri
,
J.
Kadmon
,
J.
Pennington
,
S. S.
Schoenholz
,
J.
Sohl-Dickstein
, and
S.
Ganguli
, “
Statistical mechanics of deep learning
,”
Annu. Rev. Condens. Matter Phys.
11
,
501
528
(
2020
).
2.
N.
Buduma
,
Fundamentals of Deep Learning: Designing Next-Generation Machine Intelligence Algorithms
(
O’Reilly
,
Boston
,
2017
).
3.
O.
Calin
,
Deep Learning Architectures: A Mathematical Approach
(
Springer
,
Cham, Switzerland
,
2020
).
4.
I.
Goodfellow
,
Y.
Bengio
, and
A.
Courville
,
Deep Learning
(
MIT Press
,
Cambridge, MA
,
2016
).
5.
Y.
LeCun
,
Y.
Bengio
, and
G.
Hinton
, “
Deep learning
,”
Nature
521
,
436
444
(
2015
).
6.
C. H.
Martin
and
M. W.
Mahoney
, “
Rethinking generalization requires revisiting old ideas: Statistical mechanics approaches and complex learning behavior
,” arXiv:1710.09553 [cs.LG].
7.
J.
Schmidhuber
, “
Deep learning in neural networks: An overview
,”
Neural Networks
61
,
85
117
(
2015
).
8.
A.
Shrestha
and
A.
Mahmood
, “
Review of deep learning algorithms and architectures
,”
IEEE Access
7
,
53040
53065
(
2019
).
9.
C.
Gallicchio
and
S.
Scardapane
, “
Deep randomized neural networks
,” in
Recent Trends in Learning From Data
, Studies in Computational Intelligence Vol. 896, edited by
L.
Oneto
,
N.
Navarin
,
A.
Sperduti
, and
D.
Anguita
(
Springer
,
Cham, Switzerland
,
2020
), pp.
43
68
, arXiv:2002.12287 [cs.LG].
10.
R.
Giryes
,
G.
Sapiro
, and
A. M.
Bronstein
, “
Deep neural networks with random Gaussian weights: A universal classification strategy?
,”
IEEE Trans. Signal Process.
64
,
3444
3457
(
2016
).
11.
Z.
Ling
and
R. C.
Qiu
, “
Spectrum concentration in deep residual learning: A free probability approach
,”
IEEE Access
7
,
105212
105223
(
2019
); arXiv:1807.11694 [cs.LG].
12.
A. G. d. G.
Matthews
,
J.
Hron
,
M.
Rowland
,
R. E.
Turner
, and
Z.
Ghahramani
, “
Gaussian process behaviour in wide deep neural networks
,” arXiv:1804.11271 [stat.ML].
13.
J.
Pennington
,
S.
Schoenholz
, and
S.
Ganguli
, “
The emergence of spectral universality in deep networks
,”
Proc. Mach. Learn. Res.
84
,
1924
1932
(
2018
).
14.
B.
Poole
,
S.
Lahiri
,
M.
Raghu
,
J.
Sohl-Dickstein
, and
S.
Ganguli
, “
Exponential expressivity in deep neural networks through transient chaos
,” in
Advances in Neural Information Processing Systems
, edited by
D.
Lee
,
M.
Sugiyama
,
U.
Luxburg
,
I.
Guyon
, and
R.
Garnett
, (
Curran Associates, Inc
,
2016
), pp.
3360
3368
, arXiv:1606.05340 [stat.ML].
15.
S.
Scardapane
and
D.
Wang
, “
Randomness in neural networks: An overview
,”
Wiley Interdiscip. Rev.: Data Min. Knowl. Discovery
7
,
e1200
(
2017
).
16.
S. S.
Schoenholz
,
J.
Gilmer
,
S.
Ganguli
, and
J.
Sohl-Dickstein
, “
Deep information propagation
,” arXiv:1611.01232 [stat.ML].
17.
W.
Tarnowski
,
P.
Warchol
,
S.
Jastrzebski
,
J.
Tabor
, and
M. A.
Nowak
, “
Dynamical isometry is achieved in residual networks in a universal way for any activation function
,” in
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS)
,
Naha, Okinawa, Japan
,
2019
, arXiv:1809.08848 [stat.ML].
18.
G.
Yang
, “
Tensor programs III: Neural matrix laws
,” arXiv:2009.10685 [cs.NE].
19.
J.
Pennington
and
Y.
Bahri
, “
Geometry of neural network loss surfaces via random matrix theory
,”
Proc. Mach. Learn. Res.
70
,
2798
2806
(
2017
).
20.
W.
Hu
,
L.
Xiao
, and
J.
Pennington
, “
Provable benefit of orthogonal initialization in optimizing deep linear networks
,” arXiv:2001.05992 [cs.LG].
21.
L.
Pastur
, “
On random matrices arising in deep neural networks: Gaussian case
,”
Pure Appl. Funct. Anal.
5
,
1395
1424
(
2020
); arXiv:2001.06188 [math-ph].
22.
L.
Pastur
and
V.
Slavin
, “
On random matrices arising in deep neural networks: General i.i.d. case
,”
Random Matrices: Theory Appl.
(to be published), arXiv:2011.11439 [math-ph].
23.
V. L.
Girko
,
Theory of Stochastic Canonical Equations
(
Springer
,
New York
,
2001
), Vols. I and II.
24.
V. A.
Marchenko
and
L. A.
Pastur
, “
Distribution of eigenvalues for some sets of random matrices
,”
Math. USSR Sb.
1
,
457
483
(
1967
).
25.
L. A.
Pastur
, “
On the spectrum of random matrices
,”
Theor. Math. Phys.
10
,
67
74
(
1972
).
26.
L.
Pastur
, “
Eigenvalue distribution of random matrices
,” in
Random Media 2000
, Proceedings of the Mandralin Summer School, June 2000, Poland, edited by
J.
Wehr
(
Interdisciplinary Centre of Mathematical and Computational Modeling
,
Warsaw, Poland
,
2007
), pp.
93
206
.
27.
L.
Pastur
and
M.
Shcherbina
,
Eigenvalue Distribution of Large Random Matrices
(
AMS
,
Providence, RI
,
2011
).
28.
J. A.
Mingo
and
R.
Speicher
,
Free Probability and Random Matrices
(
Springer
,
Heidelberg
,
2017
).
29.
B.
Collins
and
T.
Hayase
, “
Asymptotic freeness of layerwise Jacobians caused by invariance of multilayer perceptron: The Haar orthogonal case
,” arXiv:2103.13466v4 [stat.ML].
30.
L.
Pastur
, “
The law of multiplication of random matrices: Revisited
,”
J. Math. Phys., Anal. Geom.
(to be published) (
2022
).
31.
V.
Vasilchuk
, “
On the law of multiplication of random matrices
,”
Math. Phys., Anal. Geom.
4
,
1
36
(
2001
).
32.
R. A.
Horn
and
C. R.
Johnson
,
Matrix Analysis
(
Cambridge University Press
,
Cambridge
,
2013
).
33.
N. Ja.
Vilenkin
,
Special Functions and the Theory of Group Representations
(
AMS
,
Providence, RI
,
1968
).
34.
F. D.
Murnagan
,
The Unitary and the Rotation Group
(
Spartans Books
,
Washington
,
1962
).
35.
Y. A.
Neretin
,
Lectures on Gaussian Integral Operators and Classical Groups
(
European Mathematical Society
,
Zurich
,
2011
).
You do not currently have access to this content.