Sains Malaysiana 51(2)(2022): 599-607

http://doi.org/10.17576/jsm-2022-5102-23

 

Outlier Detection in Balanced Replicated Linear Functional Relationship Model

(Pengesanan Data Terpencil dalam Model Hubungan Fungsian Linear Bereplika Seimbang)

 

AZURAINI MOHD ARIF1,2, YONG ZULINA ZUBAIRI3* & ABDUL GHAPOR HUSSIN4

 

1Institute of Advanced Studies, University of Malaya, 50603 Kuala Lumpur, Federal Territory, Malaysia

 

2Centre for Foundation Studies, National Defence University of Malaysia, 57000 Kuala Lumpur, Federal Territory, Malaysia

 

3Centre for Foundation Studies in Science, University of Malaya, 50603 Kuala Lumpur, Federal Territory, Malaysia

 

4Faculty of Defence Science and Technology, National Defence University of Malaysia, 57000 Kuala Lumpur, Federal Territory, Malaysia

 

Received: 2 February 2021/Accepted: 19 June 2021

 

ABSTRACT

Identification of outlier in a dataset plays an important role because their existence will affect the parameter estimation. Based on the idea of COVRATIO statistic, we modified the procedure to accommodate for replicated linear functional relationship model (LFRM) in detecting the outlier. In this replicated model, we assumed the observations are equal and balanced in each group. The derivation of covariance matrices using Fisher Information Matrices is also given for balanced replicated LFRM. Subsequently, the cut-off points and the power of performance are obtained via a simulation study. Results from the simulation studies suggested that the proposed procedure works well in detecting outliers for balanced replicated LFRM and we illustrate this with a practical application to a real data set. The implication of the study suggests that with some modification to the procedures in COVRATIO, one could apply such a method to identify outliers when modelling balanced replicated LFRM which has not been explored before.

 

Keywords: Covariance matrix; covratio; influential observation; linear functional relationship model; outliers

 

ABSTRAK

Pengesanan data terpencil di dalam set data adalah penting kerana kewujudannya akan mengganggu penganggaran nilai parameter. Berdasarkan idea statistik COVRATIO, kami mengubah suai prosedur tersebut supaya bersesuaian bagi model hubungan fungsian linear bereplika dan seimbang dalam pengesanan data terpencil. Setiap unsur dalam kumpulan adalah sama dan seimbang dalam model replikasi ini. Pembentukan matriks kovarians melalui matrik maklumat Fisher juga diberikan bagi model ini. Seterusnya, titik potongan dan kuasa prestasi bagi kaedah yang dicadangkan diperoleh melalui kajian simulasi. Hasil keputusan daripada kajian simulasi menunjukkan prosedur yang dicadangkan berfungsi dengan baik dalam pengesanan data terpencil untuk model hubungan fungsian linear bereplika dan seimbang dan kami memberikan contoh ke atas set data sebenar. Implikasi daripada kajian ini menunjukkan bahawa kita boleh mengesan data terpencil dengan sedikit pengubahsuaian terhadap prosedur COVRATIO bagi model hubungan fungsian linear bereplika dan seimbang kerana pengesanan data terpencil menggunakan model ini masih belum lagi diteroka.

 

Kata kunci: Covratio; data berpengaruh; matriks kovarians; model hubungan fungsian linear; terpencil

 

REFERENCES

Acock, A.C. 2005. Working with missing values. Journal of Marriage and Family 67(4): 1012-1028.

Aggarwal, C.C. 2013. Outlier Analysis. Springer: New York.

Alcaraz-Ibáñez, M., Paterna, A., Sicilia, A. & Griffiths, M.D. 2021. A systematic review and meta-analysis on the relationship between body dissatisfaction and morbid exercise behaviour. International Journal of Environmental Research and Public Health 18(2): 585.

Alkasadi, N.A., Ibrahim, S., Abuzaid, A.H.M., Yusoff, M.I., Hamid, H., Zhe, L.W. & Razak, A.A. 2019. Outlier detection in multiple circular regression model using DFFITc statistic. Sains Malaysiana 48(7): 1557-1563.

Barnett, V.D. 1970. Fitting straight lines-the linear functional relationship with replicated observations. Applied Statistics 19(2): 135-144.

Belsley, D.A., Kuh, E. & Welsch, R.E. 1980. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. United States: John Wiley & Sons. p. 300.

Cheng, C.L. & Van Ness, J.W. 1994. On estimating linear relationship when both variables are subject to errors. Journal of Royal Statistical Society: Series B (Methodological) 56(1): 167-183.

Dorff, M. & Gurland, J. 1961. Estimation of the parameters of a linear functional relation. Journal of the Royal Statistical Society Series B (Methodological) 23(1): 160-170.

Ghapor, A.A., Zubairi, Y.Z., Mamun, A.S.M.A. & Imon, A.H.M.R. 2014. On detecting outlier in simple linear functional relationship model using COVRATIO statistic. Pakistan Journal of Statistics 30(1): 129-142.

Graybill, F.A. 1976. Theory and Application of the Linear Model. North Scituate: Duxbury Press.

Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. & Stahel, W.A. 1986. Robust Statistics: The Approach Based on Influence Functions. New York: John Wiley & Sons. p. 502.

Hussin, A.G. 2005. Approximating fisher’s information for the replicated linear circular functional relationship model. Bulletin of the Malaysian Mathematical Sciences Society 28(2): 131-139.

Hussin, A.G., Abuzaid, A.H., Ibrahim, A.I.N. & Rambli, A. 2013. Detection of outliers in the complex linear regression model. Sains Malaysiana 42(6): 869-874.

Hussin, A.A., Abuzaid, A.H., Zulkifili, F. & Mohamed, I. 2010. Asymptotic covariance and outlier detection in a liner functional relationship model for circular data with an application to the measurements of wind directions. ScienceAsia 36(3): 249-253.

Hussin, A.G., Fieller, N. & Stillman, E. 2005. Pseudo-Replicates in the linear circular functional relationship model. Journal of Applied Sciences 5(1): 138-143.

Ibrahim, S., Rambli, A., Hussin, A.A. & Mohamed, I. 2013. Outlier detection in a circular regression model using COVRATIO statistic. Communication in Statistics - Simulation and Computation 42(10): 2272-2280.

Imon, A.H.M.R. & Hadi, A.S. 2008. Identification of multiple outliers in logistic regression. Communications in Statistics-Theory and Methods 37(11): 1697-1709.

Kendall, M.G. & Stuart, A. 1979. The Advanced Theory of Statistics. London: Griffin. p. 684.

Kim, M.G. 2000. Outliers and influential observations in the structural errors-in-variables model. Journal of Applied Statistics 27(4): 451-460.

Lindley, D.V. 1947. Regression lines and the linear functional relationship. Supplement to the Journal of the Royal Statistical Society 9(2): 218-244.

Mamun, AA.S.M.A., Zubairi, Y.Z., Hussin, A.G., Imon, A.H.M.R., Rana, R. & Carrasco, J. 2019. Identification of influential observation in linear structural relationship model with known slope. Communications in Statistics - Simulation and Computation: DOI.10.1080/03610918.2019.1645172.

Mohd Arif, A., Zubairi, Y.Z. & Hussin, A.G. 2020. Parameter estimation in replicated linear functional relationship model in the presence of outliers. Malaysian Journal of Fundamental and Applied Sciences 16(2): 158-160.

Mokhtar, N.A., Zubairi, Y.Z., Hussin, A.G. & Moslim, N.H. 2019. An outlier detection method for circular linear functional relationship model using covratio statistics. Malaysian Journal of Science 38(2): 46-54.

Rambli, A., Abuzaid, A.H.M., Mohamed, I. & Hussin, A.G. 2016. Procedure for detecting outliers in a circular regression model. PLoS ONE 11(4): 0153074.

Rambli, A., Yunus, R.M., Mohamed, I. & Hussin, A.G. 2015. Outlier detection in a circular regression model. Sains Malaysiana 44(7): 1027-1032.

Satari, S.Z. & Ku Khalif, K.M. 2020. Review on outliers identification methods for univariate circular biological data. Advances in Science, Technology and Engineering Systems 5(2): 95-103.

Satman, M.H., Adiga, S., Angeris, G. & Akadal, E. 2021. LinRegOutliers: A Julia package for detecting outliers in linear regression. The Journal of Open Source Software 6(57): 1-6.

Viechtbauer, W. & Cheung, M.W.L. 2010. Outlier and influence diagnostics for meta-analysis. Research Synthesis Methods 1(2): 112-125.

Wong, M.Y. 1989. Likelihood estimation of a simple linear regression model when both variables have error. Biometrika 76(1): 141-148.

 

*Corresponding author; email: yzulina@um.edu.my

 

       

 

previous