Sains Malaysiana 49(5)(2020): 1165-1174
http://dx.doi.org/10.17576/jsm-2020-4905-22
Imputation Techniques for Incomplete Load Data Based on Seasonality and Orientation of the Missing Values
(Teknik Pengimputan untuk Data Beban tak
Lengkap Berdasarkan Kemusiman dan Orientasi Nilai yang Hilang)
NUR ARINA BAZILAH KAMISAN1*, MUHAMMAD HISYAM LEE1, ABDUL GHAPOR HUSSIN2 & YONG
ZULINA ZUBAIRI3
1Mathematics
Department, Faculty of Science,
Universiti Teknologi Malaysia, 81310 UTM Skudai, Johor Darul Takzim, Malaysia
2Faculty of Science and Defence Technology, Universiti Pertahanan
Nasional Malaysia, 50300 Kuala Lumpur, Federal Territory, Malaysia
3Pusat Asasi Sains Universiti Malaya, Universiti Malaya, 50300 Kuala
Lumpur, Federal Territory, Malaysia
Received: 12 August
2019/Accepted: 24 January 2020
ABSTRACT
In load data,
the missing problem always occurs in a set of data. Since it has a seasonal
pattern according to days, most of the time, the load usage for the next day is
predictable. For this reason, a new model has been developed based on these
characteristics. Data containing missing values being divided to
its seasonality pattern and for each subdivision, the values from mean, the mean
with standard deviation and third quartile are calculated before being rearrange to form a new set of values that
will replace the missing values. These three values will be used as imputations for the missing values. To examine the effects of
the orientation of the missing values with the choices of imputation, the missing values from the data are divided into three parts: at the front,
in the middle and at the end of the data with 5%, 15%, and 25% of missing
values. The results from root mean square error and
mean absolute error show that the proposed techniques, particularly the mean and the third quartile value, are superior
to the other complex methods when dealing with the missing values. The mean imputation is ample when the missing values is presence at the
front and in the middle of the data while the third quartile value is superior
when the missing values is at the end of the data.
Keywords: Data orientation; missing values; multiple imputation;
seasonal load data; seasonality
ABSTRAK
Dalam data beban, masalah
kehilangan data selalu berlaku dalam satu set data. Memandangkan ia mempunyai
corak bermusim mengikut hari, kebanyakan masa, penggunaan beban untuk hari
berikutnya boleh diramal. Atas sebab ini, satu model baru telah dibangunkan
berdasarkan ciri-ciri ini. Data yang mengandungi nilai yang hilang yang
dibahagikan kepada bentuk pola bermusimnya dan bagi setiap subdata, nilai min,
min bersama hasil tambah sisihan piawai dan kuartil ketiga dihitung sebelum disusun
semula untuk membentuk satu set nilai baru yang akan menggantikan nilai data
yang hilang. Ketiga-tiga nilai ini akan digunakan sebagai pengimputan untuk nilai yang hilang.
Untuk mengkaji kesan kedudukan nilai-nilai yang hilang dengan pilihan pengimputan,
nilai-nilai yang hilang daripada data dibahagikan kepada tiga bahagian iaitu: di bahagian depan data, di
tengah data dan di akhir data dengan 5%, 15% dan 25% nilai yang hilang.
Keputusan daripada ralat min punca kuasa dan ralat min mutlak menunjukkan bahawa teknik
yang dicadangkan, terutamanya pengimputan nilai min dan kuartil ketiga, memberikan
hasil yang lebih bagus daripada kaedah kompleks lain ketika berurusan dengan
nilai yang hilang. Pengimputan min adalah bagus apabila nilai-nilai yang hilang
berada di hadapan dan di tengah data manakala nilai kuartil ketiga lebih bagus
apabila nilai-nilai yang hilang berada pada bahagian akhir data.
Kata kunci: Data beban bermusim; data orientasi; kepelbagaian pengimputan; nilai
yang hilang; kemusiman
REFERENCES
Acock, A.C. 2005. Working with missing values. Journal of Marriage and Family 67(4): 1012-1028.
Bennett, D.A. 2001. How can I deal with
missing data in my study? Australian and
New Zealand Journal of Public Health 25(5):
464-469.
Brockwell, P.J. & Davis, R.A. 2013. Time Series: Theory and Methods. New
York: Springer Science & Business Media.
Chatfield, C. 2000. Time-Series Forecasting. Boca Raton: Chapman & Hall/CRC.
Cokluk, O. & Kayri, M. 2011. The
effects of methods of imputation for missing values on the validity and
reliability of scales. Educational
Sciences: Theory and Practice 11(1):
303-309.
Cumming, G., Fidler, F. & Vaux, D.L.
2007. Error bars in experimental biology. The
Journal of Cell Biology 177(1):
7-11.
Damsleth, E. 1980. Interpolating missing
values in a time series. Scandinavian
Journal of Statistics7(1): 33-39.
Ferreiro, O. 1987. Methodologies for the
estimation of missing observations in time series. Statistics & Probability Letters 5(1): 65-69.
Gerald, C.F. & Wheatley, P.O. 2004. Applied Numerical Analysis with MAPLE. Boston: Addison-Wesley.
Gómez, V., Maravall, A. & Peña, D.
1992. Computing missing values in time series. Computational Statistics 1:
283-296.
Hamilton, J.D. 1994. Time Series Analysis. Volume 2. New Jersey: Princeton University Press.
Harvey, A.C. 1990. Forecasting, Structural Time Series Models and The Kalman Filter.
Cambridge: Cambridge University Press.
Honaker, J. & King, G. 2010. What to do
about missing values in time‐series cross‐section data. American Journal of Political Science 54(2): 561-581.
Hyndman, R.J. & Koehler, A.B. 2006.
Another look at measures of forecast accuracy. International Journal of
Forecasting 22(4): 679-688.
Janacek, G.J. & Swift, L. 1993. Time Series: Forecasting, Simulation,
Applications. New York: Ellis Horwood.
Kihoro, J. & Athiany, K. 2013.
Imputation of incomplete non-stationary seasonal time series data. Mathematical Theory and Modeling 3(12): 142-154.
Peng, C.Y.J., Harwell, M., Liou, S.M. &
Ehman, L.H. 2006. Advances in missing data methods and implications for
educational research. In Real Data
Analysis, edited by Sawilowsky, S.S. North Carolina: IAP. pp. 31-78.
Penn, D.A. 2007. Estimating missing values
from the general social survey: An application of multiple imputation. Social Science Quarterly 88(2): 573-584.
Ruiz, E. & Nieto, F.H. 2000. A note on
linear combination of predictors. Statistics
& Probability Letters 47(4):
351-356.
Schafer, J.L. 1999. Multiple imputation: A
primer. Statistical Methods in Medical
Research 8(1): 3-15.
Schlomer, G.L., Bauman, S. & Card, N.A.
2010. Best practices for missing data management in counseling psychology. Journal of Counseling Psychology 57(1): 1-10.
Shukur, O.B. & Lee, M.H. 2015.
Imputation of missing values in daily wind speed data using hybrid AR-ANN
method. Modern Applied Science 9(11): 1-11.
Sorjamaa, A. & Lendasse, A. 2007. Time series prediction as a problem of
missing values: Application to ESTSP2007 and NN3 competition benchmarks. Paper presented at the, International
Joint Conference on Neural Networks 2007 (IJCNN 2007).
Willmott, C.J. & Matsuura, K. 2005.
Advantages of the mean absolute error (MAE) over the root mean square error
(RMSE) in assessing average model performance. Climate Research 30(1):
79-82.
Winkler, A. & McCarthy, P. 2005.
Maximising the value of missing data. Journal
of Targeting, Measurement and Analysis for Marketing 13(2): 168-178.
Zhang, Z., Yang, X., Li, H., Li, W., Yan,
H. & Shi, F. 2017. Application of a novel hybrid method for spatiotemporal
data imputation: A case study of the Minqin County groundwater level. Journal of Hydrology 553: 384-397.
*Corresponding
author; email: nurarinabazilah@utm.my
|