The problem of atmospheric air pollution is of great concern to human health, the clean environment and the climate, and is an actual topic in environmental research. The major pollutants of air are PM10, PM2.5, SO2, ozone (O3) and more. For each geographical region, there are specific sources of pollution, as well as conditions for the containment of harmful emissions into the air for longer periods of time. Bulgaria is a member state of the European Union in which low air quality is permanently reported, against the statutory limits in European legislation. Significant exceedances of regulatory limits have been measured for many of the major air pollutants. This paper focuses on the application and comparison of two machine learning methods - Boosted trees and regularized regression to investigate the influence of meteorological, atmospheric and other factors on air quality based on empirical data. Hybrid type models have also been built and tested. The modeling process uses daily average ground level ozone (O3) and PM10 emissions data in the city of Ruse, Bulgaria, measured at licensed automated stations, under the control of the European Environment Agency in Bulgaria. As a result, validated models with high statistical goodness-of-fit indicators such as coefficient of determination, root mean square error, etc. The best selected models show very good agreement with the measured data in the order of 80-90%. Short-term forecasts for future pollution have been made. There is a slight preference for hybrid boosted trees and regularized regression models over those generated by individual methods.

1.
M.
Kampa
and
E.
Castanas
(
2008
)
Human health effects of air pollution
,
Environmental Pollution
151
,
362
367
.
2.
A. D.
Kappos
,
P.
Bruckmann
,
T.
Eikmann
,
N.
Englert
 et al (
2004
)
Health effects of particles in ambient air
,
International Journal of Hygiene and Environmental Health
207
(
4
),
399
407
.
3.
N. A. H.
Janssen
,
P.
Fischer
,
M.
Marra
,
C.
Ameling
, and
F. R.
Casse
(
2013
)
Short-term effects of PM2.5, PM10 and PM2.5–10 on daily mortality in the Netherlands
,
Science of The Total Environment
463–464
,
20
26
.
4.
M.
Jerrett
,
R. T.
Burnett
,
C.
Arden Pope III
,
K.
Ito
,
G.
Thurston
,
D.
Krewski
,
Y.
Shi
,
E.
Calle
, and
M.
Thun
(
2009
)
Long-term ozone exposure and mortality
,
New England Journal of Medicine
360
(
11
),
1085
1095
.
5.
Executive Environment Agency (ExEA) Bulgaria
, http://pdbase.government.bg/airq/bulletin-en.jsp.
6.
I.
Nadeem
,
A. M.
Ilyas
, and
P. S. S.
Uduman
,
Forecasting ambient air quality of Chennai city in India, Geography
,
Environment, Sustainability
, .
7.
N. M.
Zahari
,
R. E.
Shamimi
, and
M. H.
Zawawi
(
2019
)
Prediction of future ozone concentration for next three days using linear regression and nonlinear regression models
,
IOP Conference Series: Materials Science and Engineering 
551
,
012006
.
8.
I.
Zheleva
,
E.
Veleva
, and
M.
Filipova
, “Analysis and modeling of daily air pollutants in the city of Ruse, Bulgaria,” in AMiTaNS’17,
AIP CP
1895
, edited by
M.
Todorov
(
American Institute of Physics
,
Melville, NY
,
2017
), paper 030007.
9.
I.
Tsvetanova
,
I.
Zheleva
,
M.
Filipova
, and
A.
Stefanova
, “
Statistical analysis of ambient air PM10 contamination during winter periods for Ruse region, Bulgaria
”, in
MATEC Web of Conferences
145
,
01007
(
2018
), NCTAM 2017, .
10.
I.
Tsvetanova
,
I.
Zheleva
, and
M.
Filipova
, “Statistical study of the influence of some atmospheric characteristics upon the particulate matter (PM10) air pollutant in the city of Ruse, Bulgaria,” in AMiTaNS’18,
AIP CP
2025
, edited by
M.
Todorov
(
American Institute of Physics
,
Melville, NY
,
2018
), paper 110006, .
11.
S. G.
Gocheva-Ilieva
,
A. V.
Ivanov
, and
I. P.
Iliev
(
2016
)
Modeling of air pollutants and ozone concentration by using multivariate analysis: Case study of Dimitrovgrad, Bulgaria
,
British Journal of Applied Science & Technology
14
(
3
),
1
8
(
2016
), Article no. BJAST.23910, doi:.
12.
D. C.
Carslaw
and
P. J.
Taylor
(
2009
)
Analysis of air pollution data at a mixed source location using boosted regression trees
,
Atmospheric Environment
43
(
22-23
),
3563
3570
, doi:.
13.
A.
Suleiman
,
M. R.
Tight
, and
A. D.
Quinn
(
2016
)
Hybrid neural networks and boosted regression tree models for predicting roadside particulate matter
,
Environmental Modeling and Assessment
21
(
6
),
731
750
.
14.
S.
Weichenthal
,
K. V.
Ryswyk
,
A.
Goldstein
,
S.
Bagg
,
M.
Shekkarizfard
, and
M.
Hatzopoulou
(
2016
)
A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach
,
Environmental Research
146
,
65
72
.
15.
X.
Ren
,
Z.
Mi
, and
P. G.
Georgopoulos
(
2020
)
Comparison of machine learning and land use regression for fine scale spatiotemporal estimation of ambient air pollution: Modeling ozone concentrations across the contiguous United States
,
Environment International
142
, Article number 105827, doi:.
16.
S.
Gocheva-Ilieva
and
A.
Ivanov
(
2019
)
Assaying stochastic SARIMA and generalized regularized regression for particulate matter PM10 modeling and forecasting
,
International Journal of Environment and Pollution (IJEP)
66
(
1-3
),
41
62
, .
17.
SPM (Salford Predictive Modeler)
,
Machine Learning and Predictive Analytics Software
, https://www.minitab.com/en-us/products/spm/.
18.
J. H.
Friedman
(
2001
)
Greedy function approximation: a gradient boosting machine
,
Annals of Statistics
29
(
5
),
1189
1232
.
19.
J. H.
Friedman
(
2002
)
Stochastic gradient boosting
,
Computational Statistics & Data Analysis
,
38
(
4
),
367
378
.
20.
(
Salford Predictive Modeler
),
User Guides
, https://www.minitab.com/en-us/products/spm/user-guides/.
21.
J. H.
Friedman
(
2012
)
Fast sparse regression and classification
,
International Journal of Forecasting
28
(
3
),
722
738
.
22.
J. H.
Friedman
and
B. E.
Popescu
, Importance sampled learning ensembles,
Technical Report, Stanford University, Department of Statistics
(
2003
), http://www-stat.stanford.edu/∼jhf/ftp/isle.pdf.
23.
J. H.
Friedman
and
B. E.
Popescu
,
Predictive learning via rule ensembles
, (
2005
), http://www-stat.stanford.edu/∼jhf/ftp/RuleFit.pdf.
This content is only available via PDF.
You do not currently have access to this content.