Sains Malaysiana 50(6)(2021): 1787-1798
http://doi.org/10.17576/jsm-2021-5006-24
Comparative Study of
Clustering-Based Outliers Detection Methods in Circular-Circular Regression
Model
(Kajian Perbandingan Kaedah Penetapan Titik Terpencil Berasaskan
Kelompok dalam Model Pendaftaran Lingkaran)
SITI ZANARIAH SATARI1*,
NUR FARAIDAH MUHAMMAD D1*, YONG ZULINA ZUBAIRI2 &
ABDUL GHAPOR HUSSIN3
1Centre for Mathematical Sciences College of Computing & Applied Sciences, Universiti Malaysia Pahang, 26300 Kuantan, Pahang Darul Makmur,
Malaysia
2Centre for Foundation Studies in Sciences,
University of Malaya, 50603 Kuala Lumpur, Federal Territory, Malaysia
3Faculty of Defence Sciences and
Technology, National Defence University of Malaysia, Sungai Besi Camp, 57000
Kuala Lumpur, Federal Territory, Malaysia
Received:
7 May 2019/Accepted: 14 October 2020
ABSTRACT
This paper is a comparative study of several algorithms for
detecting multiple outliers in circular-circular regression model based on the
clustering algorithms. Three measures of similarity based on the circular
distance were used to obtain a cluster tree using the agglomerative
hierarchical methods. A stopping rule for the cluster tree based on the mean
direction and circular standard deviation of the tree height was used as the
cutoff point and classifier to the cluster group that exceeded the stopping rule
as potential outliers. The performances of the algorithms have been
demonstrated using the simulation studies that consider several outlier
scenarios with a certain degree of contamination. Application to real data
using wind data and a simulated data set are given for illustrative purposes.
Thus, it has been found that Satari’s algorithm (S-SL algorithm) performs well
for any values of sample size n and error concentration parameter. The
algorithms are good in identifying outliers which are not limited to one or few
outliers only, but the presence of multiple outliers at one time.
Keywords: Circular distance; circular-circular regression model;
clustering; outliers; stopping rule
ABSTRAK
Kertas ini membincangkan kajian perbandingan beberapa algoritma
yang mengesan titik terpencil berganda dalam model regresi bulatan berdasarkan
algoritma berkelompok. Tiga ukuran persamaan berasaskan jarak bulatan telah
digunakan bagi mendapatkan pokok kelompok menggunakan algoritma aglomeratif
hierarki. Satu nilai potongan untuk pokok kelompok berdasarkan min terarah dan
sisihan piawai bulatan bagi ketinggian pokok tersebut telah digunakan bagi
mengkelaskan kumpulan kelompok yang melebihi titik potongan ini sebagai titik
terpencil. Prestasi algoritma ini telah diuji dalam kajian simulasi yang
mengambil kira beberapa senario titik terpencil dengan tahap berbeza. Untuk
tujuan illustrasi, satu aplikasi data sebenar menggunakan data angin dan satu
set data simulasi telah diberikan. Kami mendapati algoritma Satari (Algoritma
S-SL) adalah baik untuk sebarang nilai saiz sampel dan parameter menumpu.
Algoritma tersebut adalah baik dalam mengenal pasti titik terpencil atau
berganda pada satu masa.
Kata kunci: Algoritma berkelompok; jarak bulatan; model
regresi bulatan; nilai potongan; titik terpencil
REFERENCES
Abuzaid, A.H. 2010. Some problems of
outliers in circular data. University of Malaya. Ph.D. Thesis (Unpublished).
Abuzaid,
A.H., Hussin, A.G. & Mohamed, I.B. 2013. Detection of outliers in simple
circular regression models using the mean circular error statistic. Journal of Statistical Computation and
Simulation 83(2): 269-277.
Abuzaid,
A.H., Mohamed, I.B. & Hussin, A.G. 2012a. Boxplot for circular variables. Computational Statistics 27(3): 381-392.
Abuzaid,
A.H., Hussin, A.G., Rambli, A. & Mohamed, I.B. 2012b. Statistics for a new
test of discordance in circular data. Communications
in Statistics-Simulation and Computation 41(10): 1882-1890.
Abuzaid,
A.H., Hussin, A.G., Rambli, A. & Mohamed, I.B. 2011. COVRATIO statistic for
simple circular-circular regression model. Chiang
Mai Journal of Science 38(3): 321-330.
Abuzaid,
A.H., Hussin, A.G. & Mohamed, I.B. 2009. Identifying single outlier in
linear circular-circular regression model based on circular distance. Journal of Applied Probability &
Statistics 3(1): 107-117.
Adnan,
R. & Mohamad, M.N. 2003. Multiple outliers detection procedures in linear
regression. Matematika 19(1): 29-45.
Alkasadi,
N.A., Ibrahim, S., Ramli, M.F. & Yusoff, M.I. 2016. A comparative study of
outlier detection procedures in multiple circular regression. In AIP Conference Proceedings 1775(1):
1-7.
Blashfield,
R.K. & Morey, L.C. 1980. A comparison of four clustering methods using MMPI
Monte Carlo data. Applied
Psychological Measurement 4(1): 57-64.
Di,
N.F.M. & Satari, S.Z. 2017. The effect of different distance measures in
detecting outliers using clustering-based algorithm for circular regression
model. In AIP Conference Proceedings 1842(1):
1-13.
Di, N.F.M.,
Satari, S.Z. & Zakaria, R. 2017. Detection of different outlier scenarios
in circular regression model using single-linkage method. Journal of Physics: Conference Series 890(1): 1-5.
Caires, S. &
Wyatt, L.R. 2003. A linear functional relationship model for circular data with
an application to the assessment of ocean wave measurements. Journal of Agricultural, Biological, and
Environmental Statistics 8(2): 153-169.
Chang-Chien,
S.J., Hung, W.L. & Yang, M.S. 2012. On mean shift-based clustering for
circular data. Soft Computing 16(6):
1043-1060.
Downs, T.D. &
Mardia, K.V. 2002. Circular regression. Biometrika 89(3): 683-698.
Fisher,
N.I. 1995. Statistical Analysis of
Circular Data. Cambridge: Cambridge University Press.
Gan,
G., Ma, C. & Wu, J. 2007. Data
Clustering: Theory, Algorithms, and Applications. United States of America:
SIAM.
Hartigan,
J.A. 1975. Clustering Algorithm. New
York: John Wiley & Sons Inc.
Hussin,
A.G. & Abuzaid, A.H. 2012. Detection of outliers in functional relationship
model for circular variables via complex form. Pakistan Journal of Statistics 28(2): 205-216.
Hussin,
A.G., Abuzaid, A.H., Mohamed, I. & Rambli, A. 2013. Detection of outliers
in the complex linear regression model. Sains
Malaysiana 42(6): 869-874.
Hussin,
A.G., Abuzaid, A., Zulkifli, F. & Mohamed, I. 2010. Asymptotic covariance
and detection of influential observations in a linear relationship model for
circular data with application to the measurements of wind directions. ScienceAsia 36(2010): 249-253.
Hussin,
A.G., Fieller, N.R. & Stillman, E.C. 2004. Linear regression model for
circular variables with application to directional data. Journal of Applied Science and Technology 9(1): 1-6.
Ibrahim,
S. 2013. Some outlier
problems in a circular-circular regression model. University of Malaya.
Ph.D. Thesis (Unpublished).
Ibrahim,
S., Rambli, A., Hussin, A.G. & Mohamed, I. 2013. Outlier detection in a
circular-circular regression model using COVRATIO statistic. Communications in Statistics-Simulation and
Computation 42(10): 2270-2280.
Jammalamadaka,
S.R. & Sengupta, A. 2001. Topics In
Circular Statistics. Singapore: World Scientific.
Jammalamadaka,
S.R. & Sarma, Y.R. 1993. Circular regression. In Statistical Sciences and Data Analysis,
edited by Matusita, K. Puri, M.L. & Hayakawa, T. Utrecht:
VSP. pp. 109-128.
Milligan,
G.W. & Cooper, M.C. 1985. An examination of procedures for determining the
number of clusters in a data set. Psychometrika 50(2): 159-179.
Mojena,
R. 1977. Hierarchical grouping methods and stopping rules: An evaluation. The Computer Journal 20(4): 359-363.
Rambli,
A. 2011. Outlier detection in circular
data and circular-circular-circular regression model. University of Malaya.
M.Sc. Thesis (Unpublished).
Rambli,
A., Abuzaid, A.H., Mohamed, I.B. & Hussin, A.G. 2016. Procedure for
detecting outliers in a circular-circular regression model. PloS ONE 11(4): e0153074.
Rambli,
A., Yunus, R.M., Mohamed, I. & Hussin, A.G. 2015. Outlier detection in a
circular-circular regression model. Sains
Malaysiana 44(7): 1027-1032.
Rambli,
A., Mohamed, I., Abuzaid, A.H. & Hussin, A.G. 2010. Identification of
influential observations in circular-circular regression model. In Proceedings of the Regional Conference on
Statistical Sciences (RCSS’10). pp. 195-203.
Satari, S.Z., Di,
N.F.M. & Zakaria, R. 2017. The multiple outliers detection using
agglomerative hierarchical methods in circular regression model. Journal of Physics: Conference Series 890(1): 1-5.
Satari, S.Z.
2015. Parameter estimation and outlier detection for some types of circular
model. University of Malaya. Ph.D. Thesis (Unpublished).
Sebert,
D.M., Montgomery, D.C. & Rollier, D.A. 1998. A clustering algorithm for
identifying multiple outliers in linear regression. Computational Statistics and Data Analysis 27(4): 461-484.
*Corresponding author; email:
zanariah@ump.edu.my
|