Despite OpenMP being the defacto standard for parallel programming on shared memory system, little is known on how its schedule type and chunk size effect the parallel performance of shared memory multicore processor. Performance analysis in the literature have overlooked the effects of different schedule type and chunk size, possibly it was simply not the focus of their research. Often, the researchers did not specify the schedule type explicitly. This has resulted in the default way of assigning the loop iterations among threads. By default, the static schedule is used and the size of chunk which is the ratio of total number of iterations to the number of threads is implemented. Contrary to above, this research proposes a guideline to select the appropriate schedule type and chunk size for achieving optimum performance on different shared memory multicore platform for balanced and imbalance workload. Three multicore technology namely Intel Core i5-2410M, AMD A12-9700P and ARM Cortex-A53 are used for this work. The speedup obtained after turning on/off certain multicore technologies and a selected number of active cores per processor is analyzed. The result of analysis enables the user to justify and exercise trade-offs in selecting OpenMP schedule type and chunk size, and also in choosing the multicore technologies to meet the desired performance gain. Results analyzed over various configurations of multicore platform and workload suggested that under certain constraint different schedule types and chunk sizes have led to better speedup.

1.
Cownie
,
J.
OpenMP.
Available Online: http://www.openmp.org/ (accessed 1 December 2020)
2.
Arb
,
O.
OpenMP Application Programming Interface.
Available Online: http://www.openmp.org/specifications/ (accessed on 24 May 2020)
3.
OpenMP Loop Scheduling.
Available Online: https://software.intel.com/en-us/articles/openmp-loop-scheduling (accessed on 24 May 2021)
4.
Ajkunic
,
E.
,
Fatkic
,
H.
,
Omerovic
,
E.
,
Talic
,
K.
, &
Nosovic
,
N.
(
2012
). A Comparison of Five Parallel Programming Models for C+.
2012 Proceedings of the 35th International Convention MIPRO,
Opatija, Croatia, 21-25 May 2012; Editors ;
IEEE
:
Piscataway, NJ
, 2012 (pp.
1780
1784
).
5.
Z I
Abdul Khalib
et al., Optimizing Speedup on Multicore Platform with OpenMP Schedule Clause and Chunk Size,
Proceedings of the 1st International Symposium on Engineering and Technology (ISETech)
2019
, Perlis, Malaysia, 23 December 2019;
IOP Conference Series: Materials Science and Engineering
:
Bristol, UK
, 2020.
6.
2020
IOP Conf. Ser.: Mater. Sci. Eng.
767
012037
7.
Intel
. (n.d.).
Intel® Core™ i5-2410M Processor.
Available online: http://ark.intel.com/products/52224/Intel-Core-i5-2410M-Processor-3M-Cache-up-to-2_90-GHz (accessed on 11 November 2020)
8.
How to determine the effectiveness of Hyper-threading tehnology with an Application.
Available Online: https://software.intel.com/en-us/articles/how-to-determine-the-effectiveness-of-hyper-threading-technology-with-an-application (accessed on 21 November 2020)
9.
Frequently Asked Questions about Intel® Turbo Boost Technology.
Available Online: http://www.intel.com/content/www/us/en/support/processors/000005641.html?wapkw=turbo+boost+2.0 (accessed on 11 November 2020)
10.
DOD HPC
.
Optimal OpenMP Threading.
Available Online: https://centers.hpc.mil/users/advancedTopics/Optimal_OpenMP_Threading.html#intelTurbo (accessed 24 May 2021)
11.
Hwu
,
W.-m. W.
(
2016
).
Heterogeneous System Architecture: A New Compute Platform Infrastructure.
(
T.
Green
, Ed.)
Waltham, Massachusetts
,
United States of America: Morgan Kaufmann
.
12.
ARM
. (n.d.).
Cortex-A53 Processor.
Available Online: https://www.arm.com/products/processors/cortex-a/cortex-a53-processor.php (accessed on 24 May 2021)
13.
John
,
E.
, &
Rubio
,
J.
Unique Chips and Systems.
CRC Press, Boca Raton, Florida
,
United States
,
2008
; pp-
80
105
.
14.
B.
Vincke
,
A.
Elouardi
,
A.
Lambert
and
A.
Dine
, SIMD and OpenMP optimization of EKF-SLAM,
2014 International Conference on Multimedia Computing and Systems (ICMCS)
,
Marrakech, Morocco
, 14-16 April 2014, pp.
712
716
.
15.
Fürlinger
K.
,
Gerndt
M.
(
2007
)
Analyzing Overheads and Scalability Characteristics of OpenMP Applications
. In:
Daydé
M.
,
Palma
J.M.L.M.
,
Coutinho
Á.L.G.A.
,
Pacitti
E.
,
Lopes
J.C.
(eds)
High Performance Computing for Computational Science - VECPAR 2006.
VECPAR 2006. Lecture Notes in Computer Science, vol
4395
.
Springer
,
Berlin, Heidelberg
16.
Bull
,
J. M.
, &
O’Neill
,
D.
A Microbenchmark Suite for OpenMP 2.0
.
SIGARCH Computer Architecture News
,
2001
,
29
(
5
),
41
48
.
17.
Ayguadé
E.
et al. (
2003
) Is the Schedule Clause Really Necessary in OpenMP?
In OpenMP Shared Memory Parallel Programming.
WOMPAT 2003. Lecture Notes in Computer Science, vol
2716
: Toronto, Canada, June 26-27, 2003;
Voss
M.J.
(eds)
Springer
,
Berlin, Heidelberg
.
18.
Hassan
,
S. A.
,
Hemeida
,
A. M.
, &
M. M.
Mahmoud
,
M Performance Evaluation of Matrix-Matrix Multiplications Using Intel’s Advanced Vector Extensions (AVX
).
Microprocessors and Microsystems
,
2016
47
,
369
374
.
19.
Klemm
,
M.
,
Duran
,
A.
,
Tian
,
X.
,
Saito
,
H.
,
Caballero
,
D.
, &
Martorell
,
X.
Extending OpenMP* with Vector Constructs for Modern Multicore SIMD Architectures.
In Proc. Of 8th International Workshop on OpenMP
,
Rome, Italy
, 11-13 June
2012
; pp
59
72
.
20.
Performance Profiler: Intel® VTune™
Amplifier XE 2017.
(
2016
).
USA
:
Intel
.
21.
Pacheco
,
P.
An Introduction to Parallel Programming.
T.
Green
, Ed.;
Morgan Kaufmann. Burlington, Massachusetts
,
United States of America
,
2011
; pp.
58
66
22.
Levy
,
M.
, &
Gal-On
,
S.
Measuring Multicore Performance
.
In Computer
,
2008
, Vol.
41
, pp.
99
102
.
23.
Fog
,
A.
How good is hyperthreading?.Available online
: http://www.agner.org/optimize/blog/read.php?i=6&v=t (accessed on 30 May 2021)
This content is only available via PDF.
You do not currently have access to this content.