Sains Malaysiana 50(7)(2021): 2085-2094
http://doi.org/10.17576/jsm-2021-5007-22
Fast Improvised Influential Distance for
the Identification of Influential Observations in Multiple Linear Regression
(Penambahbaikan Pantas Jarak Pengaruh bagi Pengecaman Cerapan Berpengaruh dalam Regresi Linear Berganda)
HABSHAH MIDI1*,
MUHAMMAD SANI2, SHELAN SAIED ISMAEEL3 & JAYANTHI
ARASAN1
1Department of
Mathematics, Faculty of Science and Institute for Mathematical Research, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor Darul Ehsan, Malaysia
2Department of
Mathematical Sciences, Federal University Dutsin-Ma, Katsina State, Nigeria
3Department of
Mathematics, Faculty of Science, University of Zakho, Zakho, Iraq
Diserahkan: 5 Februari 2020/Diterima: 19 November 2020
ABSTRACT
Influential observations
(IO) are those observations that are responsible for misleading conclusions
about the fitting of a multiple linear regression model. The existing IO
identification methods such as influential distance (ID) is not very successful
in detecting IO. It is suspected that the ID employed inefficient method with
long computational running time for the identification of the suspected IO at
the initial step. Moreover, this method declares good leverage observations as
IO, resulting in misleading conclusion. In this paper, we proposed fast
improvised influential distance (FIID) that can successfully identify IO, good
leverage observations, and regular observations with shorter computational
running time. Monte Carlo simulation study and real data examples show that the
FIID correctly identify genuine IO in multiple linear regression model with no
masking and a negligible swamping rate.
Keywords: Bad leverage
point; good leverage point; influential distance; influential observations
ABSTRAK
Cerapan berpengaruh (IO) adalah cerapan yang bertanggungjawab ke atas kesimpulan yang mengelirukan bagi penyesuaian model regresi linear berganda. Kaedah pengecaman IO sedia ada seperti jarak berpengaruh (ID) tidak begitu berjaya untuk mengesan IO. Kami mengesyaki bahawa ID menggunakan kaedah yang kurang cekap dengan masa pengiraan yang panjang pada langkah awal bagi pengecaman cerapan IO. Tambahan pula, kaedah ini menunjukkan cerapan tuasan baik sebagai IO yang mengelirukan keputusan kajian. Dalam kertas ini, kami mencadangkan penambahbaikan jarak berpengaruh pantas (FIID) yang boleh mengecam IO, cerapan tuasan yang baik dan cerapan biasa dengan jayanya dengan masa pengiraan yang pantas. Kajian Monte Carlo simulasi dan contoh data sebenar menunjukkan bahawa FIID mengecam IO dalam model linear regresi berganda dengan betul tanpa penyorokan dan kadar limpahan yang sangat kecil.
Kata kunci: Cerapan berpengaruh; jarak berpengaruh; titik tuasan buruk; titik tuasan tinggi baik
RUJUKAN
Atkinson, A.C. 1988. Masking
unmasked. Biometrika 73(3): 533-541.
Atkinson, A.C. & Riani, M.
2000. Robust Diagnostic Regression
Analysis. New York: Springer-Verlag.
Belsley, D., Kuh, E. & Welsch, R. 2004. Regression Diagnostics: Identifying
Influential Data and Sources of Collinearity. Hoboken, New Jersey: John
Wiley & Sons, Inc.
Chatterjee, S. & Hadi, A.S.
2006. Regression Analysis by Example.
4th ed. Hoboken, New Jersey: John Wiley & Sons, Inc.
Chatterjee, S. & Hadi, A.S.
1986. Influential observations, high leverage points, and outliers in
regression. Statistical Science 1(3):
379-393.
Cook, R.D. 1998. Regression
Graphic: Ideas for Studying Regression through Graphics. Hoboken, New
Jersey: John Wiley & Sons, Inc.
Gray, J.B. 1985. Graphics for regression diagnostics. In American
Statistical Association Proceedings
of Statistical Computing Section. American Statistical Association. pp.
102-107.
Habshah, M., Norazan, M.R. & Rahmatullah Imon, A.H.M. 2009. The performance of diagnostic-robust
generalized potentials for the identification of multiple high leverage points
in linear regression. Journal of Applied
Statistics 36(5): 507-520.
Hadi, A.S. 1992. A new
measure of overall potential influence in linear regression. Computational Statistics & Data Analysis 14(1): 1-27.
Hadi, A.S. & Simonoff, J. 1993. Procedure for the identification of
outliers in linear models. Journal of the
American Statistics Association 88(424): 1264-1272.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. & Stahel,
W.A. 2011. Robust Statistics: The
Approach based on Influence Functions. Hoboken, Ney Jersey: John Wiley
& Sons, Inc.
Hawkins, D.M., Bradu, D. & Kass, G.V. 1984. Location of several outliers in multiple
regression data using elemental sets. Technometrics 26(3): 197-208.
Lim,
H.A. & Habshah, M. 2016. Diagnostic robust
generalized potential based on index set equality (DRGP(ISE)) for the
identification of high leverage points in linear models. Computational Statistics 31(3): 859-877.
Mohammed, A., Habshah, M. & Rahmatullah Imon, A.H.M. 2015. A
new robust diagnostic plot for classifying good and bad high leverage points in
a multiple linear regression model. Mathematical
Problems in Engineering 2015: Article ID. 279472.
Nurunnabi,
A.A.M., Nasser, M. & Rahmatullah Imon, A.H.M. 2016. Identification of multiple outliers,
high leverage points and influential observations in linear regression. Journal of Applied Statistics 43(3):
509-525.
Rahmatullah Imon, A.H.M. 2005. Identifying multiple influential
observations in linear regression. Journal
of Applied Statistics 32(9): 929-946.
Rahmatullah Imon, A.H.M. 2002. Identifying multiple high leverage
points in linear regression. Journal of
Statistical Studies 3: 207-218.
Rousseeuw, P.J. & Leroy, A.M. 1987. Robust Regression and Outlier Detection.
Wiley series in probability and mathematical statistics. Hoboken, New Jersey:
John Wiley & Sons, Inc.
Welsch, R.E.
1980. Regression sensitivity analysis and bounded-influence estimation. In Evaluation
of Econometric Models, edited by Kemnta, J. & Ramsey, J.B. New York: Academic
Press, Inc. pp. 153-167.
*Pengarang untuk surat-menyurat; email: habshahmidi@gmail.com
|