Over the years, field-programmable gate array (FPGA)-based accelerators have attracted interest and attention due to their performance and energy efficiency factors. This paper presents an optimized FPGA-based accelerator using a systolic array for matrix multiplication. In a systolic array, many identical processing elements (PEs) are arranged in a well-organized structure, and each PE is connected with the other PEs. The data will flow between neighboring elements in different directions synchronously. PE is an arithmetic logic unit (ALU) with attached working registers and local memory. This paper coded the accelerator in Verilog and simulated using the Quartus Prime with PowerPlay Power Analyzer tool for power optimization. A 3-bit ALU has been implemented using the Synopsys electronic design automation (EDA) tool. The schematic diagram, layout and verification for a complete ALU design have been accomplished. This paper provides detailed results and analyses of the systolic array and ALU in power dissipation and area density based on method and circuit implementation. The results showed that the power consumption efficiency of the accelerator improved after optimization. Also, power dissipation and the area of 3-bit ALU are reduced by 0.2 mW and 4.55 % at the back-end, respectively.

1.
B.
Wang
,
S.
Ma
,
G.
Zhu
,
X.
Yi
and
R.
Xu
,
Integration
85
,
42
47
(
2022
).
2.
Y.
Parmar
and
K.
Sridharan
,
IEEE Transactions on Circuits and Systems II: Express Briefs
67
(
2
),
370
374
(
2019
).
3.
A.
Ibrahim
,
T.
Alsomani
, and
F.
Gebali
,
Computers & Electrical Engineering
61
,
104
115
(
2017
).
4.
M.
Safarpour
,
R.
Inanlou
and
O.
Silvén
,
IEEE Transactions on Circuits and Systems II: Express Briefs
69
(
2
),
569
573
(
2021
).
5.
M.
Vucha
and
A.
Rajawat
,
International Journal of Computer Applications
26
(
3
),
18
22
(
2011
).
6.
A. H. M.
Shapri
and
N. A. Z.
Rahman
, “
Performance analysis of two-dimensional systolic array matrix multiplication with orthogonal interconnections
”,
International Journal on New Computer Architectures and Their Applications
,
1
(
4
),
1066
1075
(
2011
).
7.
P.
Garrault
and
B.
Philofsky
,
Xilinx White Paper
231
,
1
22
(
2006
).
8.
A. K.
Mukhopadhyay
,
S.
Majumder
and
I.
Chakrabarti
,
Computers & Electrical Engineering
97
,
107628
(
2022
).
9.
L. D.
Van
and
W. S.
Feng
,
IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing
48
(
4
),
359
366
(
2001
).
10.
H. T.
Kung
,
B.
McDanel
,
S. Q.
Zhang
,
X.
Dong
and
C. C.
Chen
, “Maestro: A memory-on-logic architecture for coordinated parallel use of many systolic arrays,” in
Proceedings of the 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
, (
IEEE
,
2019
), pp.
42
50
.
11.
M.
Langhammer
,
S.
Gribok
and
G.
Baeckler
, “High density 8-bit multiplier systolic arrays for FPGA”, in
Proceedings of the 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
, (
IEEE
,
2020
), pp.
84
92
.
12.
J.
Iwamoto
,
Y.
Kikutani
,
R.
Zhang
and
Y.
Nakashima
,
IEICE TRANSACTIONS on Information and Systems
103
(
3
),
578
589
(
2020
).
13.
J.
Zhang
,
Z.
Ghodsi
,
S.
Garg
and
K.
Rangineni
,
IEEE Design & Test
37
(
2
),
93
102
(
2019
).
14.
X.
Wei
,
C. H.
Yu
,
P.
Zhang
,
Y.
Chen
,
Y.
Wang
,
H.
Hu
, and
J.
Cong
, “Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs”, in
Proceedings of the 54th Annual Design Automation Conference
, (
IEEE
,
2017
), pp.
1
6
.
15.
H. T.
Kung
,
B.
McDanel
,
S. Q.
Zhang
,
C. T.
Wang
,
J.
Cai
,
C. Y.
Chen
and
D.
Yu
, “Systolic building block for logic-on-logic 3d-ic implementations of convolutional neural networks,” in
Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS)
, (
IEEE
,
2019
), pp.
1
5
.
16.
J.
Zhang
,
W.
Zhang
,
G.
Luo
,
X.
Wei
,
Y.
Liang
and
J.
Cong
, “Frequency improvement of systolic array-based CNNs on FPGAs,”
Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS)
, (
IEEE
,
2019
), pp.
1
4
.
This content is only available via PDF.
You do not currently have access to this content.