Sains Malaysiana 47(8)(2018): 1931–1940
http://dx.doi.org/10.17576/jsm-2018-4708-35
The Extra
Zeros in Traffic Accident Data: A Study on the Mixture of Discrete
Distributions
(Lebihan
Sifar dalam Data Kemalangan Jalan Raya: Satu Kajian bagi Taburan
Diskret Campuran)
ZAMIRA HASANAH ZAMZURI*, MOHD SYAFIQ SAPUAN
& KAMARULZAMAN IBRAHIM
Pusat Pengajian Sains Matematik, Fakulti Sains dan Teknologi,
Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor Darul Ehsan,
Malaysia
Received: 29 March 2018/Accepted: 2 April 2018
ABSTRACT
The presence of extra zeros is commonly observed in traffic
accident count data. Past research opt to the zero altered models and explain
that the zeros are sourced from under reporting situation. However, there is also
an argument against this statement since the zeros could be sourced from Poisson trial process. Motivated by the
argument, we explore the possibility of mixing several discrete distributions
that can contribute to the presence of extra zeros. Four simulation studies
were conducted based on two accident scenarios and two discrete distributions: Poisson and negative binomial; by considering six combinations of proportion values
correspond to low, moderate and high mean values in the distribution. The results
of the simulation studies concur with the claim as the presence of extra zeros
is detected in most cases of mixed Poisson and mixed negative binomial
data. Data sets that are dominated by Poisson (or negative binomial)
with low mean show an apparent existence of extra zeros although the sample
size is only 30. An illustration using a real data set concur the same
findings. Hence, it is essential to consider the mixed discrete distributions
as potential distributions when dealing with count data with extra zeros. This
study contributes on creating awareness of the possible alternative
distributions for count data with extra zeros especially in traffic accident
applications.
Keywords: Hurdle models; negative binomial; Poisson; proportion; simulation study;
traffic accident; zero-inflated models
ABSTRAK
Kehadiran lebihan sifar sering dicerap dalam data
bilangan kemalangan jalan raya. Kajian lepas cenderung kepada
penggunaan model dengan ubah suaian sifar dan menjelaskan bahawa
lebihan sifar ini berpunca daripada keadaan kemalangan tidak terlapor.
Walau bagaimanapun, terdapat tentangan terhadap pernyataan ini dengan
kehadiran lebihan sifar ini boleh berpunca daripada campuran beberapa
taburan diskret yang mewakili taburan bagi masa atau lokasi berbeza.
Maka, kajian ini bermatlamat untuk meneroka teori bahawa taburan
disket tercampur boleh menyumbang kepada lebihan sifar dalam data
bilangan. Empat kajian simulasi dijalankan berdasarkan
dua senario kemalangan dan dua taburan diskret: Poisson dan binomial negatif;
dengan mengambil kira enam gabungan nilai perkadaran bagi nilai
purata rendah, sederhana dan tinggi dalam taburan tersebut. Keputusan
kajian bersetuju dengan teori tersebut dengan kehadiran lebihan
sifar dapat dikenal pasti dalam kebanyakan kes data Poisson
tercampur dan binomial negatif tercampur. Set data yang didominasi
oleh Poisson (atau binomial negatif) dengan nilai purata
rendah menunjukkan bilangan lebihan sifar yang ketara walaupun
saiz sampel hanyalah 30. Oleh itu, adalah
amat penting bagi pengkaji untuk mengambil kira taburan diskret
tercampur ini apabila berhadapan data bilangan dengan lebihan
sifar. Kajian ini menyumbang dalam
mencetus kesedaran berkenaan potensi taburan alternatif untuk
data bilangan terlebih sifar terutamanya dalam aplikasi kemalangan
jalan raya.
Kata kunci: Binomial negatif;
kajian simulasi; kemalangan jalan raya; model lebihan sifar; model
terpangkas; perkadaran; Poisson
REFERENCES
Breunning, S.M. & Bone, A.J. 1959. Interchange
Accident Exposure Highway Research Board Bulletin 240, Washington D.C:
National Research Council. pp: 44-52.
Bruin, J. 2006. Newtest: Command to compute new test. UCLA:
Statistical Consulting Group. https://stats.idre.ucla.edu/stata/ ado/analysis/.
Chen, F., Suren, C. & Ma, X. 2016. Crash frequency modeling using real-time environmental and
traffic data and unbalanced panel data models. Int. J. Environ. Res. Public Health 13(6). doi: 10.3390/ijerph13060609.
Chin, H.C.C. & Quddus, M.A. 2003. Applying the random
effect negative binomial model to examine traffic accident occurrence at
signalized intersections. Accident Analysis and Prevention 35(2):
253-259.
Dalrymple, M.L., Hudson, I.L. &
Hudson, R.P. 2003. Finite mixture, zero-inflated poisson and hurdle models with application to SIDS. Computational Statistics
& Data Analysis 41(3): 491-504.
Dong, C., Clarke, D.B., Yan, X.,
Khattak, A. & Huang, B. 2016. Multivariate random-parameters zero-inflated negative binomial regression
model: An application to estimate crash frequencies at intersections. Accident
Analysis and Prevention 70: 320-329.
Hauer, E., Ng, J.C.N. & Lovell, J.
1988. Estimation of safety at signalized
intersections. Transportation Research Record 1185: 48-61.
Ismail, N., Mohd Ali, K.M. & Chiew,
A.C. 2004. A model for insurance
claim count with single and finite mixture distribution. Sains
Malaysiana 33(2): 173-194.
Kim, D.H., Ramjan, M.N. & Mak, K.
2016. Prediction of vehicle crashes by
drivers’ characteristics and past traffic violations in Korea using a
zero-inflated negative binomial model. Traffic Injury Prevention 17(1):
86-90.
Kumara, S.S.P. & Chin, H.C. 2003. Modelling accident
occurrence at signalized tee intersections with special emphasis on excess
zeros. Traffic Injury Prevention 4(1): 53-57.
Kweon, Y.J. & Kockelman, K.M. 2003. Overall injury risk
to different drivers: Combining exposure, frequency, and severity models. Accident
Analysis & Prevention 35(4): 441-450.
Li, Z., Knight, S., Cook, L.J., Holubkov, R. & Olson,
L.M. 2008. Modeling motor vehicle crashes for street racers using zero-inflated
models. Accident Analysis and Prevention 40(2): 835-839.
Lord, D., Washington, S.P. & Ivan, J.N. 2005. Poisson,
Poisson-gamma and zero-inflated regression models of motor vehicle crashes:
Balancing statistical fit and theory. Accident Analysis and Prevention 37(1):
35-46 .
Mahdavi, M. & Mahdavi, M. 2014. Stochastic lead time demand
estimation via monte carlo simulation technique in supply chain planning. Sains
Malaysiana 43(4): 629-636.
Manan, M. & Varhelyi, A. 2012. Motorcycle fatalities in Malaysia. IATSS Research 36:
30-39.
Martin, T.G., Wintle, B.A., Rhodes, J.R., Kuhnert, P.M.,
Field, S.A., LowChoy, S.J., Tyre, A.J. & Possingham, H.P. 2005. Zero
tolerance ecology: Improving ecological inference by modelling the source of
zero observations. Ecology Letters 8(11): 1235-1246.
Maycock, G. & Hall, R.D. 1984. Accidents
at 4-arm roundabouts. Laboratory Report LR1120, Transport Research
Laboratory, Crowthorne, Berks, UK (Unpublished).
Miaou, S.P. 2001. Estimating Roadside Encroachment Rates
with the Combined Strengths of Accident and Encroachment- Based Approaches
(FHWARD-01-124). Oak Ridge, TN: Oak Ridge National Laboratory (Unpublished).
Miao, S.P. 1994. The relationship between truck accidents
and geometric design of road sections: Poisson versus negative binomial
regressions. Accident Analysis & Prevention 26: 471-482.
Miaou, S.P. & Lum,
H. 1993. Modeling vehicle accidents and highway geometric design relationships. Accident Analysis & Prevention 25(6): 689-709.
Miaou, S.P., Hu, P.S., Wright, T.,
Rathi, A.K. & Davis, S.C. 1992. Relationship between truck accidents and highway geometric design: A Poisson
regression approach. Transportation Research Record 1376: 10-18.
Oh, J., Washington, S.P. & Nam, D. 2006. Accident
prediction model for railway- highway interfaces. Accident Analysis &
Prevention 38: 346-356.
Roshandeh, A.M., Agbelie, B. & Lee, Y. 2016. Statistical modelling of total crash frequency at highway
intersections. Journal of Traffic and Transportation Engineering 3(2):
166-171.
Qin, X., Ivan, J.N. & Ravishanker,
N. 2004. Selecting exposure measures in crash
rate prediction for two-lane highway segments. Accident Analysis &
Prevention 36: 183-191.
Ridout, M., Clarice, G.B. & Hinde,
J. 1998. Models for count data with many zeros. International Biometric Conference, Cape Town.
Shankar, V., Milton, J. &
Mannering, F.L. 1997. Modelling accident
frequency as zero-altered probability processes: An empirical enquiry. Accident
Analysis & Prevention 29: 829-837.
Shankar, V.N., Gudmundur, F.U., Ram,
M.P. & MaryLou, B.N. 2003. Modelling crashes involving pedestrians and motorized traffics. Safety
Science 41: 627-640.
Tanner, J.C. 1953. Accidents at rural
three way junctions. Journal of the Institution of Highway Engineers 2(11):
56-67.
Ullaha, S., Caroline, F. & Fincha,
L.D. 2010. Statistical modelling for falls count
data. Accident Analysis & Prevention 42(2): 384-392.
Warton, D.I. 2005. Many zeros does not mean zero inflation: Comparing the goodness- of-fit of parametric models to
multivariate abundance data. Environmetrics 16(3): 275- 289.
Welsh, A.H., Cunningham, R.B.,
Donnelly, C.F. & Lindenmayer, D.B. 1996. Modelling the abundance of rare species: Statistical models
for counts with extra zeros. Ecological Modelling 88(13): 297-308.
Zamzuri, Z.H. 2016 Selected models for correlated traffic
accident count data. Advances in industrial and applied
mathematics. Proceedings of 23rd Malaysian National
Symposium of Mathematical Sciences, SKSM 2015. American Institute of
Physics Inc. p. 1750.
Zamzuri, Z. 2015. An alternative method for fitting a zero
inflated negative binomial distribution. Global Journal of Pure and Applied
Mathematics 11(4): 2461-2467.
Zegeer, C.V., Stewart, J.R., Huang, H.H. & Lagerwey,
P.A. 2001. Safety effects of marked vs. unmarked crosswalks at uncontrolled
locations: Analysis of pedestrian crashes in 30 cities (with discussion and
closure). Transportation Research Record 1773: 56-68.
*Corresponding
author; email: zamira@ukm.edu.my