Sains Malaysiana 51(12)(2022):
4153-4160
http://doi.org/10.17576/jsm-2022-5112-22
Performance
Analysis and Discrimination Procedure of Two-Group Location Model with Some
Continuous and High-Dimensional of Binary Variables
(Analisis Prestasi dan Prosedur Pembezaan Model Lokasi Dua Kumpulan dengan Sebilangan Pemboleh Ubah Selanjar dan Dimensi Tinggi Pemboleh Ubah Binari)
HASHIBAH
HAMID1,*, FRIDAY ZINZENDOFF OKWONU2,
NOR AISHAH AHAD1 & HASLIZA ABDUL RAHIM3
1School of Quantitative Sciences, UUM
College of Arts and Sciences, Universiti Utara
Malaysia, 06010 UUM Sintok, Kedah, Malaysia
2Department of Mathematics, Delta State
University, Abraka, Delta, P.M.B.1, Nigeria
3School of
Computer and Communication Engineering, Universiti Malaysia
Perlis, 02600 UniMAP Arau, Perlis, Malaysia
Received:
23 April 2022/Accepted: 26 August 2022
Abstract
This
research’s primary goal was to evaluate the performance analysis of the
recently constructed smoothed location models (SLMs) for discrimination
purposes by combining two kinds of multiple correspondence analysis (MCA) to
handle high dimensionality problems arising from the binary variables. A
previous study of SLM, together with MCA as well as principal component
analysis (PCA), displayed that the misclassification rate was still very high
with respect to a large number of binary variables. Thus, two new SLMs are
constructed in this paper to solve this particular problem. The first model
results from the combination of SLM with Burt MCA (denoted as SLM+Burt), and the second one is with the joint
correspondence analysis (denoted as SLM+JCA). The findings showed that both
models performed well for all sample sizes (n) and all binary variables
(b) under investigation, except n=60 and b=25 for the
SLM+JCA model. Overall, the SLM+JCA model yields a greater performance in
contrast to the SLM+Burt model. Moreover, the concept
and procedures of the discrimination for the two-group classification conducted
in this paper can be extended to multi-class classification as practitioners
often deal with many groups and complexities of variables.
Keywords:
Discrimination; large binary variables; misclassification rate; multiple
correspondence analysis; smoothed location model
Abstrak
Matlamat utama penyelidikan ini adalah untuk menilai analisis prestasi model lokasi terlicin (SLMs) yang dibina sebelum ini untuk tujuan pembezaan dengan menggabungkan dua jenis analisis kesepadanan berganda (MCA) bagi menangani masalah dimensi tinggi yang berlaku daripada pemboleh ubah binari.
Kajian terdahulu mengenai SLM bersama-sama dengan MCA serta analisis komponen utama (PCA), menunjukkan bahawa kadar salah pengelasan masih sangat tinggi dengan sejumlah besar bilangan pemboleh ubah binari.
Oleh itu, dalam kajian ini, dua SLMs baharu dibina untuk menyelesaikan masalah khusus ini. Model pertama terhasil daripada gabungan SLM dengan Burt MCA (ditandakan sebagai SLM+Burt), dan yang kedua adalah dengan analisis kesepadanan bersama (ditandakan sebagai SLM+JCA).
Hasil kajian menunjukkan bahawa kedua-dua model menunjukkan prestasi yang baik untuk semua saiz sampel (n) dan semua pemboleh ubah binari (b) di bawah kajian, kecuali untuk kes n=60 dan b=25 bagi model SLM+JCA. Secara keseluruhan, model SLM+JCA menghasilkan prestasi yang lebih baik berbanding model SLM+Burt. Selain itu, konsep dan prosedur pembezaan untuk pengelasan dua kumpulan yang dijalankan dalam kajian ini boleh diperluaskan kepada pengelasan berbilang kumpulan kerana pengamal sering berurusan dengan banyak kumpulan dan kerumitan pemboleh ubah.
Kata kunci: Analisis kesepadanan berganda; diskriminasi; kadar salah pengelasan; model lokasi terlicin; pembezaan; pemboleh ubah binari besar
REFERENCES
Asparoukhov, O. & Krzanowski, W.J. 2000. Non-parametric smoothing of the
location model in mixed variable discrimination. Statistics and Computing 10(4): 289-297.
Dávideková, M.,
Michal Greguš, M.L. & Bureš,
V. 2019. Yet another classification of ICT in knowledge management initiatives:
Synchronicity and interaction perspective. Journal of Engineering and
Applied Sciences 14(Special Issue 9): 10549-10554.
El Abbassi,
M., Overbeck, J., Braun, O., Calame, M., van der Zant,
H.S. & Perrin, M.L. 2021. Benchmark and application of unsupervised
classification approaches for univariate data. Communications Physics 4(1):
1-9.
Greenacre,
M.J. 2007. Correspondence Analysis in Practice (2nd ed.). Boca Raton:
Chapman & Hall.
Greenacre,
M.J. & Blasius, J. 2006. Multiple Correspondence Analysis and Related
Methods. London: Taylor and Francis Group.
Hamid,
H. 2018. New location model based on automatic trimming and smoothing
approaches. Journal of Computational and Theoretical Nanoscience 15(2):
493-499.
Hamid,
H. 2014. Integrated smoothed location model and data reduction approaches for
multi variables classification. PhD Dissertation, Universiti Utara Malaysia, Malaysia (Unpublished).
Hamid,
H. 2010. A new approach for classifying large number of mixed variables. International
Journal: World Academy of Science, Engineering and Technology 46: 156-161.
Hamid,
H., Zainon, F. & Yong, T.P. 2016. Performance
analysis: An integration of principal component analysis and linear
discriminant analysis for a very large number of measured variables. Research
Journal of Applied Sciences 11(11): 1422-1426.
Hamid,
H., Ngu, P.A.H. & Alipiah,
F.M. 2018. New smoothed location models integrated with PCA and two types of
MCA for handling large number of mixed continuous and binary variables. Pertanika Journal of Science & Technology 26(1): 247-260.
Jimoh, R.G., Abisoye, O.A. & Uthman, M.M.B. 2022. Ensemble
feed-forward neural network and support vector machine for prediction of
multiclass malaria infection. Journal of Information and Communication
Technology 21(1): 117-148.
Jolliffe,
I.T. 1986. Principal Component Analysis. New York: Springer-Verlag.
Kaiser,
H.F. 1961. A note on Guttmann’s lower bound for the number of common factors. British
Journal of Mathematical and Statistical Psychology 14: 1-2.
Kemsley, E.K. 1996.
Discriminant analysis of high-dimensional data: A comparison of principal
component analysis and partial least squares data reduction methods. Chemometrics
and Intelligent Systems 33: 47-61.
Krzanowski, W.J. 1995.
Selection of variables, and assessment of their performance, in mixed variable
discriminant analysis. Computational Statistics and Data Analysis 19(4):
419-431.
Krzanowski, W.J. 1993. The
location model for mixtures of categorical and continuous variables. Journal
of Classification 10: 25-49.
Krzanowski, W.J. 1983.
Stepwise location model choice in mixed-variable discrimination. Applied
Statistics 32(3): 260-266.
Krzanowski, W.J. 1980.
Mixtures of continuous and categorical variables in discriminant analysis. Biometrics 36: 493-499.
Massey,
W.F. 1965. Principal components regression in exploratory statistical research. Journal of American Statistical Association 60: 234-246.
Nenadic, O. &
Greenacre, M.J. 2007. Correspondence analysis in R, with two- and
three-dimensional graphics: The ca Package. Journal of Statistical Software 20(3): 1-13.
Okwonu, F.Z., Dieng, H., Othman, A.R.
& Ooi, S.H. 2012. Classification of
aedes adults mosquitoes in two distinct groups based
on fisher linear discriminant analysis and FZOARO techniques. Mathematical
Theory and Modeling 2(6): 22-30.
Rencher,
A.C. 2002. Methods of Multivariate Analysis: Wiley Series in Probability and
Statistics. 2nd ed. New York: John Wiley & Sons, Inc.
Vlachonikolis, I.G.
& Marriott, F.H.C. 1982. Discrimination with mixed binary and continuous
data. Applied Statistics 31(1): 23-31.
*Corresponding
author; email: hashibah@uum.edu.my
|