The telecommunications industry faced challenges with their datasets, primarily due to their high dimensionality and other issues such as imbalanced classes and missing values. These deficiencies led to inaccurate predictions and a decline in performance when the datasets were not handled properly. Due to the significant disparity in size between the churned customer class and the active customer class, the accuracy paradox arose. Consequently, despite the model’s accuracy metrics reaching 90%, this level of performance aligned with the actual distribution of classes. In addition, the presence of numerous features significantly prolonged the time required for learning and computation. This was due to the inclusion of redundant and unnecessary features, which created disarray and hindered the learning process. Therefore, the purpose of this study was to determine the effect of feature selection, imputation data, and techniques for dealing with imbalanced data on model performance. This study proposed the improvement of the techniques for developing voluntary churn models by combining techniques for dealing with imbalance and missing data with high dimensionality. Thus, when compared to other combinations of models, the combination of Decision Trees+Mode Imputation+SMOTE with Random Undersampling methods and Random Forest as the classifier builder produced the highest classification accuracy, AUC, and F1-Score. Additionally, this study suggested the use of Dask or PySpark for processing the large telecommunication dataset to allow for the faster and more effective execution of other machine learning algorithms in Python via parallel computing.
Skip Nav Destination
Article navigation
13 September 2024
5TH INTERNATIONAL CONFERENCE ON MATHEMATICAL SCIENCES (ICMS5)
16–17 May 2023
Bangi, Malaysia
Research Article|
September 13 2024
Big data analytics approaches for treatment of imbalance and missing values problems on high dimensionality dataset
Muhammed Haziq Muhammed Nor;
Muhammed Haziq Muhammed Nor
a)
1
Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia
, 43600 UKM Bangi, Selangor DE, Malaysia
a)Corresponding author: [email protected]
Search for other works by this author on:
Mohd Aftar Abu Bakar;
Mohd Aftar Abu Bakar
b)
1
Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia
, 43600 UKM Bangi, Selangor DE, Malaysia
Search for other works by this author on:
Noratiqah Mohd Ariff;
Noratiqah Mohd Ariff
c)
1
Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia
, 43600 UKM Bangi, Selangor DE, Malaysia
Search for other works by this author on:
Hasmirah Hassan;
Hasmirah Hassan
d)
1
Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia
, 43600 UKM Bangi, Selangor DE, Malaysia
Search for other works by this author on:
Siti Amira Nadia Ahmad Tajudin
Siti Amira Nadia Ahmad Tajudin
e)
1
Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia
, 43600 UKM Bangi, Selangor DE, Malaysia
Search for other works by this author on:
a)Corresponding author: [email protected]
AIP Conf. Proc. 3150, 050004 (2024)
Citation
Muhammed Haziq Muhammed Nor, Mohd Aftar Abu Bakar, Noratiqah Mohd Ariff, Hasmirah Hassan, Siti Amira Nadia Ahmad Tajudin; Big data analytics approaches for treatment of imbalance and missing values problems on high dimensionality dataset. AIP Conf. Proc. 13 September 2024; 3150 (1): 050004. https://doi.org/10.1063/5.0228054
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
17
Views
Citing articles via
Inkjet- and flextrail-printing of silicon polymer-based inks for local passivating contacts
Zohreh Kiaee, Andreas Lösel, et al.
Effect of coupling agent type on the self-cleaning and anti-reflective behaviour of advance nanocoating for PV panels application
Taha Tareq Mohammed, Hadia Kadhim Judran, et al.
Students’ mathematical conceptual understanding: What happens to proficient students?
Dian Putri Novita Ningrum, Budi Usodo, et al.
Related Content
Customer churn behaviour prediction in telecommunication using classification algorithms and modelling
AIP Conf. Proc. (January 2025)
Big data analytics for photovoltaic and electric vehicle management in sustainable grid integration
J. Renewable Sustainable Energy (January 2025)
A study of tools, techniques and language for the implementation of algorithm for brain tumor detection
AIP Conf. Proc. (November 2023)
A machine learning based model for customer churn prediction in telecommunication
AIP Conf. Proc. (December 2024)
ClustSyn: A hybrid approach based on clustering and synthetic sample generation for imbalance data classification
AIP Conference Proceedings (May 2022)