Prediction on motorcyclist severity is always a critical task for transportation system and a promising research topic in road safety studies. Machine learning models have gained popularity in the recent years due to their strong prediction accuracy. Therefore, we aim at comparing the predictive performance, including prediction accuracy and estimation of variable importance, among the machine learning models. In this study, crash data from Malaysia is used to predict the motorcyclist severity using variables such as road type, speed limit, location type and collision type. The analysis begins with the use of random forest (RF) to adequately select important features for prediction. Then, three most often used machine learning models, which are multinomial logistic regression (MLR), decision tree (DT) and support vector machine (SVM), are applied and their performances are evaluated. The results indicated that the most important features in predicting the motorcyclist severity are the number of drivers killed, and environmental factors such as traffic system, collision type and light condition. Among the three models used in this study, SVM has shown better performance with 82.14% accuracy than DT and LR.

1.
Abdel-Aty
,
M.
(
2003
).
Analysis of Driver Injury Severity Levels at Multiple Locations Using Ordered Probit Models
.
Journal of Safety Research
,
34
(
5
),
597
603
.
2.
Abdulhafedh
,
A.
(
2017
).
Incorporating the Multinomial Logistic Regression in Vehicle Crash Severity Modelling: A Detailed Overview
.
Journal of Transportation Technologies
,
7
(
03
)
3.
Abdul Manan
,
M. M.
, &
Várhelyi
,
A.
(
2012
).
Motorcycle fatalities in Malaysia
.
IATSS Research
,
36
(
1
),
30
39
.
4.
Abdul Manan
,
M.M.
,
Várhelyi
,
A.
,
Kemal Celik
,
A.
,
Hashim
,
H.H.
(
2018
).
Road characteristics and environment factors associated with motorcycle fatal crashes in Malaysia
.
IATSS Research
,
42
(
4
),
207
220
. .
5.
Abdul Manan
,
M.M.
,
Zulkiffli
,
N.S.M.
&
Jamil
,
H.M.
(
2020
).
Motorcycle Crash Causation Study (MCCS) Along Malaysian Expressways
.
International Journal of Road Safety
,
1
(
1
),
26
34
.
6.
Anvari
M.B.
,
Tavakoli Kashani
A.
,
Rabieyan
R.
(
2017
).
Identifying the Most Important Factors in the At-Fault Probability of Motorcyclists by Data Mining, Based on Classification Tree Models
.
Int J Civ Eng. Springer International Publishing
,
15
,
653
662
.
7.
Ari
,
E.
, (
2016
)
Using Multinomial Logistic Regression to Examine the Relationship between Children’s Work Status and Demographic Characteristics
.
Research Journal of Politics, Economics and Management
,
4
(
1
)
8.
Bayaga
,
A.
(
2010
).
Multinomial Logistic Regression: Usage and Application in Risk Analysis
.
Journal of Applied Quantitative Methods
;
5
(
2
).
9.
Breiman
,
L.
(
2001
). Random Forests.
Machine Learning
45
(
1
),
Springer
,
5
32
.
10.
Brownlee
,
J.
(
2020
, August 2).
How to Calculate Precision, Recall, and F-Measure for Imbalanced classification
.
Imbalanced Classification.
Retrieved from:https://machinelearningmastery.com/precision-recall-and-f-measure-for-imbalanced-classification/#:∼:text=Precision%20quantifies%20the%20number%20of,and%20recall%20in%20one%20number
11.
Byvatov
,
E.
, &
Schneider
,
G.
(
2003
).
Support Vector Machine Applications in Bioinformatics
.
Applied in Bioinformatics
,
2
(
2
),
67
77
12.
Chen
,
S.Y.
,
Wang.
W.
, &
Henk
,
J.Z.
(
2009
).
Construct Support Vector Machine Ensemble to Detect Traffic Incident
.
Expert Systems with Applications
,
36
(
8
),
10976
10986
13.
Chen
,
B.
,
He
,
C.
, &
Wang
,
J.
(
2011
).
Freeway Accident Detection Model Based on Support Vector Machine
.
International Conference on Transportation Engineering
14.
Cheu
,
R.L.
,
Xu
,
J.
,
Kek
,
A.G.H.
,
Lim
,
W.P.
, &
Chen
,
W.L.
(
2006
).
Forecasting shared-use vehicle trips with neural networks and support vector machines
.
Transportation Research Record: Journal of Transportation Research Board
,
1968
(
1
),
40
46
. doi:
15.
Dubey
,
A.
(
2018
, Dec 15).
Feature Selection Using Random Forest
.
Toward Data Science.
https://towardsdatascience.com/understanding-random-forest-58381e0602d2
16.
Hosmer
,
D. W.
,
Lemeshow
,
S.
, &
Sturdıvant
,
R.
(
2013
).
Applied Logistic Regression
.
Canada: Wiley & Sons Publication.
17.
Jha
,
A.N.
,
Chatterjee
,
N.
,
Tiwari
,
G.
(
2021
).
A Performance Analysis of Prediction Techniques for Impacting Vehicles in Hit-and-Run Road Accidents
.
Accident Analysis and Prevention
,
157
:
106164
. DOI: . Epub 2021 May 3. PMID: 33957476.
18.
Kumar
S.
,
Toshniwal
D.
(
2017
).
Severity analysis of powered two wheeler traffic accidents in Uttarakhand, India. Eur Transp Res Rev
.
European Transport Research Review
;
9
,
24
.
19.
Liang
G.
(
2015
).
Automatic Traffic Accident Detection Based on the Internet of Things and Support Vector Machine
.
International Journal of Smart Home
;
9
(
4
);
97
106
. DOI:
20.
Li
,
Z.
,
Liu
,
P.
,
Wang
,
W.
,
Xu
,
C.
(
2012
).
Using support vector machine models for crash injury severity analysis
.
Accident Analysis & Prevention
,
45
:
478
486
. DOI: PMID: 22269532
21.
Montella
A.
,
Aria
M.
,
D’Ambrosio
A.
,
Mauriello
F.
(
2012
).
Analysis of powered two-wheeler crashes in Italy by classification trees and rules discovery
.
Accid Anal Prev.
4
(
9
),
58
72
. DOI: .
22.
Mora
,
J.A.N.
and
Aragon
,
M.B.M.
(
2022
). Data Analytics Applications in Emerging Markets.
Springer Nature Singapore
.
23.
Niyogisubizo
,
J.
,
Murwanashyaka
,
E.
,
Nziyumva
,
E.
(
2021
).
A Comparative Study on Machine Learning-based Approaches for Improving Traffic Accident Severity Prediction
.
International Journal of Engineering Research & Technology
,
10
(
10
).
24.
Pisner
D.A.
,
Schnyer
D.M.
(
2020
)
Support vector machine. In: Mechelli A, Vieira SBT-ML
(eds) Chapter 6.
Academic Press, Cambridge
, pp
101
121
, ISBN 978-0-12-815739-8
25.
Płoński
,
P.
(
2020
, June 29).
Random Forest Feature Importance Computed in 3 Ways with Python
.
Mljar.
https://mljar.com/blog/feature-importance-in-random-forest/
26.
Royal Malaysian Police [PDRM]
. (
2017
).
Statistical report road accident Malaysia 2017.
27.
Sayad. S.
Decisi
. (n. d.). (
2022
, May 21).
An Introduction to Data Science
.
Decision Tree – Classification.
Retrieved from https://www.saedsayad.com/decision_tree.htm
28.
Tabachnick
,
B.G.
,
Fidell
,
L.S.
, and
Osterlind
,
S.J.
(
2001
). Using Multivariate Statistics.
US, Allyn and Bacon, Boston
29.
Tavakoli Kashani
A.
,
Rabieyan
R.
,
Besharati
M.M.
(
2014
).
A data mining approach to investigate the factors influencing the crash severity of motorcycle pillion passengers
.
Journal of Safety Research
,
51
:
93
98
. DOI: . PMID: 25453182.
30.
Umaña-Hermosilla
,
B.
,
de la
Fuente-Mella
, H.,
Elórtegui-Gómez
,
C.
, &
Fonseca-Fuentes
,
M.
(
2020
).
Multinomial Logistic Regression to Estimate and Predict the Perceptions of Individuals and Companies in the Face of the COVID-19 Pandemic in the Ñuble Region, Chile
.
Sustainability
,
12
(
22
),
9553
.
31.
Washıngton
,
S.
,
Karlaftıs
,
M.
,
Mannering
,
F.
(
2003
). Statistical and Econometric Models for Transportation Data Analysis.
Boca Raton FL, CRC Press
.
32.
Widodo
,
A.
&
Yang
,
B-S.
(
2007
).
Support Vector Machine in Machine Condition Monitoring and Fault Diagnosis
.
Mechanical Systems and Signal Processing
,
21
(
6
);
2560
2574
.
33.
Zhang
,
J.
,
Li
,
Z.
,
Xu.
,
C.
(
2018
).
Comparing Prediction Performance for Crash Injury Severity among Various Machine Learning and Statistical Methods
. in
IEEE Access
,
6
,
60079
60087
. DOI: .
34.
Zhang
,
Y.L.
,
Xie
,
Y.C.
(
2007
).
Forecasting of Short-Term Freeway with V-Support Vector Machines
.
Transportation Research Record
,
2024
(
1
),
92
99
.
This content is only available via PDF.
You do not currently have access to this content.