Sains Malaysiana 40(12)(2011): 1437–1447
A Monte Carlo Simulation Study on High Leverage Collinearity-Enhancing Observation and its Effect on Multicollinearity Pattern
(Kajian Simulasi Monte Carlo Terhadap Cerapan Titik Tuasan Tinggi yang Mempertingkatkan Kolinearan dan Kesannya Terhadap Pola Multikolinearan)
Habshah Midi1,2, Arezoo Bagheri1* & A.H.M. Rahmatullah Imon3
1Faculty of Science, Universiti Putra Malaysia, 43400 Serdang, Selangor D.E., Malaysia
2Institute For Mathematical Research (INSPEM), Universiti Putra Malaysia
43400 Serdang, Selangor D.E., Malaysia
3Department of Mathematical Sciences, Ball State University, Muncie IN 47306, USA
Received: 5 February 2010 / Accepted: 11 November 2010
ABSTRACT
Outliers in the X-direction or high leverage points are the latest known source of multicollinearity. Multicollinearity is a nonorthogonality of two or more explanatory variables in multiple regression models, which may have important influential impacts on interpreting a fitted regression model. In this paper, we performed Monte Carlo simulation studies to achieve two main objectives. The first objective was to study the effect of certain magnitude and percentage of high leverage points, which are two important issues in tending the high leverage points to be collinearity-enhancing observations, on the multicollinarity pattern of the data. The second objective was to investigate in which situations these points do make different degrees of multicollinearity, such as moderate or severe. According to the simulation results, high leverage points should be in large magnitude for at least two explanatory variables to guarantee that they are the cause of multicollinearity problems. We also proposed some practical Lower Bound (LB) and Upper Bound (UB) for High Leverage Collinearity Influential Measure (HLCIM) which is an essential measure in detecting the degree of multicollinearity. A well-known example is used to confirm the simulation results.
Keywords: Collinearity influential measure; collinearity influential observations; condition number; diagnostic Robust Generalized Potential (DRGP) method; high leverage points
ABSTRAK
Titik terpencil arah X atau titik tuasan tinggi adalah punca terkini bagi multikolinearan. Multikolinearan berlaku apabila dua atau lebih pembolehubah tak bersandaran dalam model regresi berganda tak berortogonal, yang mungkin memberi pengaruh penting ke atas interpretasi model regresi tersuai. Dalam kertas ini, kami menjalankan kajian simulasi Monte Carlo untuk mencapai dua objektif utama. Objektif pertama ialah untuk mengkaji kesan magnitud tertentu dan peratus titik tuasan tinggi ke atas pola data, yang mana keduanya adalah dua isu penting yang menjuruskan titik tuasan tinggi kepada cerapan yang mempertingkatkan kolinearan. Objektif kedua adalah untuk mengkaji situasi bagaimana titik tuasan ini menjadikan tahap multikolinearan berbeza, seperti sederhana atau tinggi. Berpandukan kepada keputusan simulasi, titik tuasan tinggi sepatutnya mempunyai magnitud yang besar bagi sekurang-kurannya dua pembolehubah takbersandaran untuk memastikan mereka adalah penyebab masalah multikolinearan. Kami juga mencadangkan Batas Bawah (LB) and Batas Atas (UB) bagi Ukuran Titik Tuasan Tinggi Berpengaruh Kolinearan (HLCIM) yang menjadi ukuran penting untuk mengesan tahap multikolinearan. Contoh terkenal digunakan untuk menentusahkan keputusan simulasi.
Kata kunci: Cerapan yang mempertingkatkan kolinearan; kaedah Potensi Teritlak Teguh Berdaignostik (DRGP); nombor kondisi; titik tuasan tinggi; ukuran kolinearan berpengaruh
REFERENCES
Belsley, D.A. 1984. Demeaning conditioning diagnostics through centering (with comments). The American Statistician 38(2): 73-93.
Belsley, D.A. 1991. Conditioning Diagnostics - Collinearity and Weak Data in Regression. In Probability and Mathematical Statistics, New York: Wiley Series.
Belsley, D.A., Kuh, E. & Welsch, R.E. 1980. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: Wiley.
Habshah, M., Norazan, M.R. & Imon, A.H.M.R. 2009. The performance of Diagnostic-Robust Generalized Potentials for the identification of multiple high leverage points in linear regression. Journal of Applied Statistics 36(5): 507-520.
Hadi, A.S. 1992. A new measure of overall potential influence in linear regression. Computational Statistics & Data Analysis 14: 1-27.
Hadi, A.S. 1988. Diagnosing collineariy-influential observations. Computational Statistics & Data Analysis 7: 143-159.
Hawkins, D.M., Bradu, D. & Kass, G.V. 1984. Location of several outliers in multiple regression data using elemental sets. Technometrics 26: 197-208.
Hoaglin, D.C. & Welsch, R.E. 1978. The Hat Matrix in regression and ANOVA. Journal of American Statistical Association 32: 17-22.
Hocking, R.R. & Pendelton, O.J. 1983. The regression dilemma. Communications in Statistics-Theory and Methods 12: 497-527.
Imon, A.H.M.R. 2002. Identifying multiple high leverage points in linear regression. Journal of Statistical Studies 3: 207-218.
Kamruzzaman, M.D. & Imon, A.H.M.R. 2002. High leverage point: another source of multicollinearity. Pakistan Journal of Statistics 18: 435-448.
Maronna, R.A., Martin, R.D. & Yohai, V.J. 2006. Robust Statistics Theory and Methods. New York: Wiley & Sons.
Marquardt, D.W. 1970. Generalized inverses, ridge regression, biased linear estimation and nonlinear estimation. Technometrics 12: 591-612.
Mason, C.H. & Perreault, jr. W.D. 1991. Collinearity, Power, and Interpretation of Multiple Regression Analysis. Journal of Marketing Research XXVIII (August): 268-280.
Moller, S.F., Frese, J.V. & Bro. R. 2005. Robust Methods for multivariate data analysis. Journal of Chemometrics 19: 549-563.
Montgomery, D.C., Peck, E.A. & Viving, G.G. 2001. Introduction to linear regression Analysis. 3th ed. New York: John Wiley and Sons.
Neter, J., Kutner, M.H., Wasserman W. & Nachtsheim, C.J. 2004. Applied Linear Regression Models. New York: MacGRAW-Hill/Irwin.
Rosen, D.H. 1999. The Diagnosis of Collinearity: A Monte Carlo Simulation Study, Department of Epidemiology. PhD thesis School of Emory University.
Rousseeuw, P.J. 1985. Multivariate estimation with high breakdown point. In Mathematical Statistics and Applications, edited by Reidel Dordrecht B: 283-297.
Schindler, J.S. 1986. Regression Diagnostics: Mechanical and Structural Aspects of Collinearity. PhD thesis Department of Biostatistics. University of North Carolina at Chapel Hill.
Sengupta, D. & Bhimasankaram, P. 1997. On the roles of observations in collineariy in the linear model. Journal of American Statistical Association 92(439): 1024-1032.
Stinnett, S.S. 1993. Collinearity in Mixed Models. PhD thesis Department of Biostatistics, University of North Carolina at Chapel Hill.
*Corresponding author; email: abagheri_000@yahoo.com
|