Sains Malaysiana 43(12)(2014): 1973–1977

 

Eigenstructure-Based Angle for Detecting Outliers in Multivariate Data

(Sudut Berasaskan Struktur Eigen untuk Mengesan Titik Terpencil dalam Data Multivariat)

 

 

NAZRINA AZIZ*

UUM College of Arts and Sciences, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia

 

Received: 20 February 2013/Accepted: 2 May 2014

 

ABSTRACT

There are two main reasons that motivate people to detect outliers; the first is the researchers' intention; see the example of Mr Haldum's cases in Barnett and Lewis. The second is the effect of outliers on analyses. This article does not differentiate between the various justifications for outlier detection. The aim was to advise the analyst about observations that are isolated from the other observations in the data set. In this article, we introduce the eigenstructure based angle for outlier detection. This method is simple and effective in dealing with masking and swamping problems. The method proposed is illustrated and compared with Mahalanobis distance by using several data sets.

 

Keywords: Angle; Eigenstructure; masking; outliers; swamping

 

ABSTRAK

Terdapat dua sebab utama yang mendorong orang ramai untuk mengesan titik terpencil, yang pertama adalah hasrat penyelidik; lihat contoh kes Encik Haldum di Barnett dan Lewis. Yang kedua adalah kesan titik terpencil ke atas analisis. Kertas ini tidak membezakan antara pelbagai justifikasi untuk mengesan titik terpencil. Tujuannya adalah untuk berkongsi dengan penganalisis mengenai cerapan yang terpencil daripada cerapan lain dalam set data. Dalam kertas ini, kami memperkenalkan sudut berasaskan struktur eigen untuk mengesan titik terpencil. Kaedah ini adalah mudah dan berkesan dalam berurusan dengan masalah litupan dan limpahan. Kaedah yang dicadangkan digambarkan dan dibandingkan dengan jarak Mahalanobis menggunakan beberapa set data.

 

Kata kunci: Limpahan; litupan; struktur eigen; sudut; titik terpencil

REFERENCES

 

Atkinson, A.C. 1994. Fast very robust methods for the detection of multiple outliers. Journal of the American Statistical Association 89(428): 1329-1339.

Barnett, V. & Lewis, T. 1994. Outliers in Statistical Data. New York: Wiley and Sons.

Caroni, C. & Billor, N. 2007. Robust detection of multiple outliers in grouped multivariate data. Journal of Applied Statistics 34(10): 1241-1250.

Chatterjee, S. & Hadi, A.S. 1988. Sensitivity Analysis in Linear Regression. United States: John Wiley.

Cook, R.D. & Weisberg, S. 1982. Residuals and Influence in Regression. New York: Chapman and Hall.

Franklin, S., Thomas, S. & Brodeur, M. 2000. Robust multivariate outlier detection using Mahalanobis distance and modified Stahel-Donoho estimators. Proceeding International Conference on Establishment Surveys, New York. pp. 697- 706.

Gao, S., Li, G. & Wang, D.Q. 2005. A new approach for detecting multivariate outliers. Communication in Statistics-Theory and Method. 34: 1857-1865.

Hadi, A.S. 1992. Identyfying multiple outliers in multivariate data. Journal Royal Statistics Soc. B. 54(3): 761-777.

Hampel, F.R. 1971. A general qualitative definition of robustness. Annals of Mathematics Statistic 42(6): 1887-1896.

Hawkins, D.M. 1980. Identification of Outliers. London: Chapman and Hall.

Hawkins, D.M., Bradu, D. & Kass, G.V. 1984. Location of several outliers in multiple regression data using elemental sets. Technometrics 26(3): 197-208.

Hodge, V.J. 2004. A survey of outlier detection methodologies. Artificial Intelligence Review 22(2): 85-126.

Mertens, B.J.A. 1998. Exact principle component influence measure applied to the analysis of spectroscopic data on rice. Applied Statistics 47(4): 527-542.

Pena, D. & Prieto, F.J. 2001. Multivariate outlier detection and robust covariance matrix estimation. Technometrics 43(3): 286-299.

Quinn, G.P. & Keough, M.J. 2002. Experimental Design and Data Analysis for Biologists. Cambridge: Cambridge University Press.

Rocke, D.M. & Woodruff, D.L. 1996. Identification of outliers in multivariate data. Journal of the American Statistical Association 91(435): 1047-1061.

Rousseeuw, P.J. & Driessen, K.V. 1999. A fast algorithm for the minimum covariance determinant estimator. American Statistical Association and the American Society for Quality 41(3): 212-223.

Rousseeuw, P.J. & Leroy, A.M. 1987. Robust Regression and Outlier Detection. New York: John Wiley.

Rousseeuw, P.J. & von Zomeren, B.C. 1990. Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association 85(411): 633-639.

Shapiro, S.S. & Wilk, M.B. 1965. An analysis of variance test for normality (complete samples). Biometrika 52: 591-611.

Siotani, M. 1959. The extreme value of the generalized distance of the individual points in the multivariate normal sample. Annals of the Institute of Statistical Mathematics 10: 183-208.

Wang, S.G. & Liski, E.P. 1993. Effects of observations on the eigensystem of a sample covariance matrix. Journal of Statistical Planning and Inference 36: 215-226.

Wang, S.G. & Nyquist, H. 1991. Effects on the eigenstructure of a data matrix when deleting an observation. Computational Statistics and Data Analysis 11(2): 179-188.

Wulder, M. 2002. A Practical Guide to the Use of Selected Multivariate Statistics. Victoria: Canadian Forest Service.

Wilk, S.S. 1963. Multivariate statistical outliers. Sankhya 25: 407-426.

 

 

*Corresponding author; email: nazrina@uum.edu.my

 

 

previous