Sains Malaysiana 46(2)(2017): 255–265
http://dx.doi.org/10.17576/jsm-2017-4602-10
Feature
Selection Algorithms for Malaysian Dengue Outbreak Detection Model
(Pemilihan
Ciri Algoritma untuk Model Pengesanan Wabak Denggi)
HUSAM, I.S1., ABUHAMAD1, AZURALIZA ABU BAKAR1, SUHAILA ZAINUDIN1*,
MAZRURA SAHANI2 & ZAINUDIN MOHD ALI2
1Center
for Artificial Intelligence Technology, Faculty of Information Science and
Technology
Universiti
Kebangsaan Malaysia, 43600, UKM Bangi, Selangor Darul Ehsan, Malaysia
2Faculty
of Health Sciences, Universiti Kebangsaan Malaysia, Jalan Raja Muda Abd Aziz
50300
Kuala Lumpur, Wilayah Persekutuan, Malaysia
3Public
Health Department, Ministry of Health, Jalan Rasah, 70300 Seremban, Negeri
Sembilan Darul Khusus, Malaysia
Received:
11 March 2016/Accepted: 8 June 2016
ABSTRACT
Dengue fever is considered as one of the most common mosquito
borne diseases worldwide. Dengue outbreak detection can be very useful in terms
of practical efforts to overcome the rapid spread of the disease by providing
the knowledge to predict the next outbreak occurrence. Many studies have been
conducted to model and predict dengue outbreak using different data mining
techniques. This research aimed to identify the best features that lead to
better predictive accuracy of dengue outbreaks using three different feature
selection algorithms; particle swarm optimization (PSO),
genetic algorithm (GA) and rank search (RS).
Based on the selected features, three predictive modeling techniques (J48, DTNB and Naive Bayes) were applied for dengue outbreak detection. The
dataset used in this research was obtained from the Public Health Department,
Seremban, Negeri Sembilan, Malaysia. The experimental
results showed that the predictive accuracy was improved by applying feature
selection process before the predictive modeling process. The study also showed
the set of features to represent dengue outbreak detection for Malaysian health
agencies.
Keywords: Feature selection; dengue outbreak; knowledge discovery
from databases; nature-based algorithms; outbreak detection
ABSTRAK
Demam denggi merupakan penyakit bawaan nyamuk
yang wujud di merata dunia. Pengesanan wabak denggi bermanfaat sebagai satu
usaha praktikal mengawal penyebaran penyakit ini dengan menyediakan pengetahuan
untuk meramal kejadian wabak yang seterusnya. Penyelidikan
lepas telah dijalankan untuk memodel dan meramal pengesanan wabak denggi
menggunakan pelbagai teknik perlombongan data. Penyelidikan
ini bertujuan untuk mengenal pasti ciri yang meningkatkan ketepatan ramalan
wabak denggi menggunakan tiga algoritma pemilihan ciri; particle swarm optimization (PSO), genetic algorithm (GA)
dan rank search (RS). Berdasarkan ciri yang dipilih, tiga teknik permodelan
ramalan (J48, DTNB dan Naive Bayes) dijalankan untuk peramalan wabak
denggi. Set data yang digunakan dalam penyelidikan ini diperoleh dari
Jabatan Kesihatan Awam, Negeri Sembilan, Malaysia. Keputusan kajian menunjukkan
bahawa ketepatan ramalan meningkat apabila proses pemilihan ciri dijalankan
sebelum proses permodelan. Kajian ini turut menghasilkan set ciri baru untuk
mewakilkan pengesanan wabak denggi untuk agensi berkaitan kesihatan di
Malaysia.
Kata kunci: Algoritma
berasaskan alam; pemilihan ciri; penemuan ilmu dari pangkalan data; pengawalan
wabak; wabak denggi
REFERENCES
Ambu, S., Lim, L.H.,
Sahani, M. & Bakar, A.B. 2003. Climate change-impact on public health in
Malaysia. Environ Health Focus 1: 13-21.
Andrick, B., Clark, B.,
Nygaard, K., Logar, A., Penaloza, M. & Welch, R. 1997. Infectious disease and climate change: Detecting
contributing factors and predicting future outbreaks. Geoscience
and Remote Sensing, 1997. IGARSS ‘97. Remote Sensing - A Scientific Vision for Sustainable Development 4: 1947-1949. IEEE International.
Bakar, A.A., Kefli, Z.,
Abdullah, S. & Sahani, M. 2011. Predictive models for dengue outbreak using multiple rulebase
classifiers. Electrical Engineering and Informatics (ICEEI), 2011
International Conference, Bandung. pp. 1-6.
Barbazan, P., Yoksan, S.
& Gonzalez, J.P. 2002. Dengue hemorrhagic fever epidemiology in Thailand: Description
and forecasting of epidemics. Microbes Infect. 4: 699-705.
Beltz, L.A. 2011. Emerging Infectious
Diseases: A Guide to Diseases, Causative Agents, and Surveillance. New
York: John Wiley & Sons. pp. 315-322.
Bolón-Canedo, V.,
Sánchez-Maroño, N. & Alonso-Betanzos, A. 2012. An ensemble of filters and
classifiers for microarray data classification. Pattern Recognition 45:
531-539.
Bolón-Canedo, V.,
Sánchez-Maroño, N. & Alonso-Betanzos, A. 2011. Feature selection and classification in
multiple class datasets: An application to KDD Cup 99 dataset. Expert
Systems with Applications 38: 5947-5957.
Buckeridge, D.L., Burkom,
H., Campbell, M., Hogan, W.R. & Moore, A.W. 2005. Algorithms for rapid outbreak detection: A
research synthesis. Journal of Biomedical Informatics 38: 99-113.
Chidlovskii, B. & Lecerf, L. 2008. Scalable feature selection for multi-class problems. 2008. Proceedings
of the 2008 European Conference on Machine Learning and Knowledge Discovery in
Databases - Part I (ECML PKDD ‘08), Walter Daelemans, Bart Goethals, and
Katharina Morik (Eds.). Springer-Verlag, Berlin, Heidelberg. pp. 227-240.
Chong, C. 2010. Scenario of
dengue in Malaysia. Paper presented at Europe-South East Asia
Symposium on Dengue, 5-6 August 2010, Ministry of Health, Malaysia.
Delatte, H., Gimonneau, G.,
Triboire, A. & Fontenille, D. 2009. Influence of temperature on immature development, survival,
longevity, fecundity, and gonotrophic cycles of Aedes albopictus, vector
of chikungunya and dengue in the Indian Ocean. Journal of Medical Entomology 46: 33-41.
Edelman, R. 2007. Dengue vaccines approach the
finish line. Clin. Infect. 45(Suppl. 1): S56-S60.
El Akadi, A., Amine, A., El
Ouardighi, A. & Aboutajdine, D. 2011. A two-stage gene selection scheme utilizing
MRMR filter and GA wrapper. Knowledge and Information Systems 26:
487-500.
Fu, X., Liew, C., Hung, T.,
Goh, H. & Lee, G. 2007. Time-series infectious disease data analysis
using SVM and genetic algorithm. In IEEE Congress
on Evolutionary Computation. pp. 1276-1280.
Goh, K. 1997. Dengue-a
re-emerging infectious disease in Singapore. Annals of the Academy of
Medicine Singapore 26(5): 664-670.
Gomez, J.C., Boiy, E. &
Moens, M.F. 2012. Highly discriminative
statistical features for email classification. Knowledge and Information
Systems 31(1): 23-53.
Gubler, D.J. 2008. Dengue viruses. In Encyclopedia of Virology. 3rd ed., edited by Mahy, B.W.J. & van
Regenmortel, M.H.V. Boston: Academic Press. pp. 5-14.
Guha-Sapir, D. &
Schimmer, B. 2005. Dengue fever: New
paradigms for a changing epidemiology. Emerg. Themes. Epidemiol. 2(1):
1-10.
Guy, B. & Almond, J.W. 2008. Towards a
dengue vaccine: Progress to date and remaining challenges. Comparative
Immunology. Microbiology and Infectious Diseases 31(2- 3): 239-252.
Guyon, I. 2003. An
introduction to variable and feature selection. Journal of
Machine Learning Research 3: 1157-1182.
Hombach, J. 2007. Vaccines against dengue: A
review of current candidate vaccines at advanced development stages. Revista
Panamericana de Salud Pública 21(4): 254-260.
Husin, N.A. & Salim, N. 2008. A comparative
study for back propagation neural network and non-linear regression models for
dengue outbreak prediction. Jurnal Teknologi Maklumat 20(4): 97-112.
Hussin, N., Jaafar, J.,
Naing, N.N., Mat, H.A., Muhamad, A.H. & Mamat, M.N. 2005. A review of dengue fever incidence in Kota
Bharu, Kelantan, Malaysia during the years 1998- 2003. Southeast Asian J. Trop. Med. Public Health 36(5): 1179-1186.
Li, C., Lim, T., Han, L.
& Fang, R. 1985. Rainfall, abundance of Aedes aegypti and dengue infection in
Selangor, Malaysia. Southeast Asian J. Trop. Med. Public Health 16(4):
560-568.
Liu, H. & Yu, L. 2005. Toward
integrating feature selection algorithms for classification and clustering. IEEE Trans. on Knowl. and Data Eng. 17(4):
491-502.
Long, Z., Abu Bakar, A.,
Razak Hamdan, A. & Sahani, M. 2010. Multiple attribute frequent mining-based for dengue outbreak. In Proceedings
of the 6th International Conference on Advanced Data Mining and Applications:
Part I (ADMA’10), edited by Longbing Cao, Yong Feng and Jiang Zhong.
Berlin, Heidelberg: Springer-Verlag. pp. 489-496.
Loscalzo, S., Yu, L. &
Ding, C. 2009. Consensus
group stable feature selection. In Proceedings of the 15th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining (KDD ‘09). New
York: ACM. pp. 567-576.
Mousavi, M., Bakar, A.A.,
Zainudin, S. & Awang, Z. 2013. Negative selection algorithm for dengue
outbreak detection. Turkish Journal of Electrical Engineering &
Computer Science 21: 2345-2356.
Nemati, S. & Basiri, M.
2010. Particle
swarm optimization for feature selection in speaker verification. Applications of Evolutionary Computation. Lecture
Notes in Computer Science 6024: 371-380.
Nyamah, M., Sulaiman, S.
& Omar, B. 2010. Categorization of
potential breeding sites of dengue vectors in Johor, Malaysia. Tropical
Biomedicine 27(1): 33-40.
Patz,
J.A. & Reisen, W.K. 2001. Immunology, climate change and
vector-borne diseases. Trends in Immunology 22(4): 171-172.
Que,
J. & Tsui, F.C. 2011. Rank-based spatial clustering: An algorithm for rapid
outbreak detection. Journal of the American Medical Informatics Association 18(3):
218-224.
Reiter,
P. 2001. Climate change and mosquito-borne disease. Environ.
Health Perspect. 109(Suppl 1): 141-161.
Research, S. P. f., Diseases, T. i. T., & Diseases, W.
H. O. D. o. C. o. N. T. (2010). Dengue Bulletin. 34.
Runge-Ranzinger, S., Horstick, O., Marx, M. & Kroeger,
A. 2008. What does dengue disease surveillance contribute to
predicting and detecting outbreaks and describing trends?. Tropical Medicine & International Health 13: 1022-1041.
Saari, P., Eerola, T. & Lartillot, O. 2011. Generalizability and simplicity as criteria in feature selection: Application
to mood classification in music. IEEE Transactions on Audio, Speech, and
Language Processing 19(6): 1802-1812.
Seng,
S.B., Chong, A.K. & Moore, A. 2005. Geostatistical
modelling, analysis and mapping of epidemiology of Dengue Fever in Johor State,
Malaysia. Presented at the 17th Annual Colloquium of the Spatial
Information Research Centre (SIRC 2005: A Spatio-temporal Workshop). pp.
109-123.
Shekhar,
K.C. & Huat, O.L. 1992. Epidemiology of dengue/ dengue hemorrhagic fever in
Malaysia - A retrospective epidemiological study 1973-1987. Part I: Dengue
hemorrhagic fever (DHF). Asia Pac. J. Public Health 6(3): 126-133.
Skae,
F. 1902. Dengue fever in Penang. Br. Med. J. 2(2185):
1581-1582.
Sun, Y., Babbs, C. & Delp, E. 2005. A
comparison of feature selection methods for the detection of breast cancers in
mammograms: Adaptive sequential floating search vs. genetic algorithm. IEEE-EMBS
2005. 27th Annual International Conference, Shanghai. pp. 6532-6535.
Talarmin, A., Peneau, C., Dussart, P., Pfaff, F., Courcier,
M., de Rocca-Serra, B. & Sarthou, J. 2000. Surveillance of dengue fever in French Guiana by monitoring the results of
negative malaria diagnoses. Epidemiol. Infect. 125(1): 189-193.
Toth, E., Brath, A. & Montanari, A. 2000. Comparison of short-term rainfall prediction models for real-time flood
forecasting. Journal of Hydrology 239(1-4): 132-147.
Tuv, E., Borisov, A., Runger, G. & Torkkola, K. 2009. Feature selection with ensembles, artificial variables, and redundancy
elimination. J. Mach. Learn. Res. 10: 1341-1366.
Vainer,
I., Kraus, S., Kaminka, G.A. & Slovin, H. 2011. Obtaining scalable and
accurate classification in large-scale spatio-temporal domains. Knowledge
and Information Systems 29(3): 527-564.
World Health Organization. 2009. Research SPF, Diseases TIT, Diseases WHOD, Epidemic WHO and P. Alert, Dengue,
Guidelines for Diagnosis, Treatment, Prevention and Control.
Wu, Y., Lee, G., Fu, X., Soh, H. & Hung, T. 2009. Mining
weather information in dengue outbreak: Predicting future cases based on
wavelet, SVM and GA. Advances in Electrical Engineering and Computational
Science. Netherlands: Springer. pp. 483-494.
Zhang, Y., Ding, C. & Li, T. 2008. Gene selection algorithm by combining reliefF and mRMR. BMC
Genomics Supp1 2: S27.
*Corresponding
author; email: suhaila.zainudin@ukm.edu.my
|