Count frequency data are found in many applications such as traffic accident analysis, car insurance claims, hospital admissions records and adverse vaccine reaction data. In some cases, these data have high zero counts and/or heavy tails due to cases such as no claim or no incidence in insurance and accident data, respectively. In regression modelling, the data size can be formidable due to either large sample size and/or large number of covariates or predictors which leads to computational challenges. Although many computer engineering solutions are available through supercomputers and parallel computing, there exists limitations due to cost and accessibility. As such, statistical solutions have been considered to ameliorate the challenges posed by regression modelling of big data. In general, these statistical solutions are classified as divide and conquer approach, fine-to-coarse method and subsampling methods. To address the problem of large data size in count regression modelling, we propose a stratified subsampling strategy according to frequency classes with shrinkage leveraging for statistical inference. An attractive feature of this strategy is in its ability to preserve the characteristics of data like over dispersion, high zero counts and/or heavy-tailed. A Monte Carlo simulation study is conducted to investigate the performance of the proposed stratified subsampling method in regression modelling with big count data. The regression analysis will be illustrated using a family of mixed Poisson regression models which have been shown to be flexible in its ability to model count data with high zero counts and/or a long tail.
Skip Nav Destination
,
Article navigation
28 June 2023
FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE & DATA ANALYTICS: Incorporating the 1st South-East Asia Workshop on Computational Physics and Data Analytics (CPDAS 2021)
21–24 November 2021
Kuala Lumpur, Malaysia
Research Article|
June 28 2023
Stratified subsampling for regression analysis of big count response data Available to Purchase
Yeh-Ching Low;
Yeh-Ching Low
a)
1
Department of Computing and Information Systems, School of Engineering and Technology, Sunway University
, 47500 Selangor, Malaysia
a)Corresponding author: [email protected]
Search for other works by this author on:
Seng-Huat Ong
Seng-Huat Ong
b)
2
Institute of Actuarial Science and Data Analytics, UCSI University
, 56000 Kuala Lumpur, Malaysia
Search for other works by this author on:
Yeh-Ching Low
1,a)
Seng-Huat Ong
2,b)
1
Department of Computing and Information Systems, School of Engineering and Technology, Sunway University
, 47500 Selangor, Malaysia
2
Institute of Actuarial Science and Data Analytics, UCSI University
, 56000 Kuala Lumpur, Malaysia
a)Corresponding author: [email protected]
AIP Conf. Proc. 2756, 040010 (2023)
Citation
Yeh-Ching Low, Seng-Huat Ong; Stratified subsampling for regression analysis of big count response data. AIP Conf. Proc. 28 June 2023; 2756 (1): 040010. https://doi.org/10.1063/5.0140504
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
37
Views
Citing articles via
The implementation of reflective assessment using Gibbs’ reflective cycle in assessing students’ writing skill
Lala Nurlatifah, Pupung Purnawarman, et al.
Effect of coupling agent type on the self-cleaning and anti-reflective behaviour of advance nanocoating for PV panels application
Taha Tareq Mohammed, Hadia Kadhim Judran, et al.
Classification data mining with Laplacian Smoothing on Naïve Bayes method
Ananda P. Noto, Dewi R. S. Saputro
Related Content
Boltzmann–Shannon interaction entropy: A normalized measure for continuous variables with an application as a subsample quality metric
Chaos (December 2023)
Effects of duty cycle on passive acoustic monitoring metrics: The case of blue whale songs
J. Acoust. Soc. Am. (April 2024)
Effects of duty cycles on passive acoustic monitoring of southern resident killer whale (Orcinus orca) occurrence and behavior
J. Acoust. Soc. Am. (March 2022)
Analysis of ontogenetic spectra of populations of plants and lichens via ordinal regression
AIP Conf. Proc. (March 2015)
Ensemble models for predicting the hardness of alloy steels
AIP Conf. Proc. (November 2023)