Sains Malaysiana 45(11)(2016): 1755–1761
The Performance
of Multiple Imputations for Different Number of Imputations
(Prestasi
Pelbagai Imputasi
untuk Bilangan Imputasi Berlainan)
GAZEL SER1*,
SIDDIK
KESKIN2
& M. CAN YILMAZ1
1Department
of Animal Science, Biometry and Genetics Unit, Faculty of Agriculture,
University of Yuzuncu Yil,
65080 Van, Turkey
2Department of Biostatistics,
Faculty of Medicine, University of Yuzuncu
Yil, 65080 Van
Turkey
Diserahkan: 17 September
2015/Diterima: 17 Mac 2016
ABSTRACT
Multiple imputation method
is a widely used method in missing data analysis. The method consists
of a three-stage process including imputation, analyzing and pooling.
The number of imputations to be selected in the imputation step
in the first stage is important. Hence, this study aimed to examine
the performance of multiple imputation method at different numbers
of imputations. Monotone missing data pattern was created in the
study by deleting approximately 24% of the observations from the
continuous result variable with complete data. At the first stage
of the multiple imputation method, monotone regression imputation
at different numbers of imputations (m=3, 5, 10 and 50) was performed.
In the second stage, parameter estimations and their standard
errors were obtained by applying general linear model to each
of the complete data sets obtained. In the final stage, the obtained
results were pooled and the effect of the numbers of imputations
on parameter estimations and their standard errors were evaluated
on the basis of these results. In conclusion, efficiency of parameter
estimations at the number of imputation m=50 was determined as
about 99%. Hence, at the determined missing observation rate,
increase was determined in efficiency and performance of the multiple
imputation method as the number of imputations increased.
Keywords: Multiple imputation;
number of imputations; relative efficiency
ABSTRAK
Kaedah pelbagai imputasi
adalah suatu
kaedah yang digunakan secara meluas dalam
menganalisis data yang hilang. Kaedah ini terdiri
daripada proses tiga
peringkat termasuk imputasi, analisis dan pengumpulan. Bilangan imputasi yang dipilih dalam langkah imputasi
pada peringkat
pertama adalah penting. Oleh yang demikian, kajian
ini bertujuan
untuk mengkaji prestasi pelbagai kaedah imputasi pada bilangan imputasi
yang berbeza. Corak data hilang
monoton telah
dibentuk dalam kajian ini dengan
menghapuskan kira-kira
24% pemerhatian daripada hasil berterusan pemboleh ubah dengan
data yang lengkap. Pada
peringkat pertama
kaedah pelbagai imputasi, imputasi regresi monoton dalam bilangan imputasi yang berbeza (m=3, 5, 10
dan 50) telah
dijalankan. Pada peringkat kedua, penganggar parameter dan ralat piawaian telah diperoleh dengan mengaplikasikan model linear
umum kepada
setiap set data lengkap yang diperoleh. Pada peringkat akhir,
keputusan yang diperoleh
telah dikumpulkan dan kesan bilangan
imputasi ke
atas penganggar parameter dan ralat piawai
mereka dinilai
berdasarkan keputusan ini. Kesimpulannya, kecekapan penganggar parameter kepada bilangan imputasi m=50 telah ditentukan sebanyak 99%. Oleh itu, pada
kadar pemerhatian
hilang yang ditentukan,
kenaikan telah ditentukan dalam kecekapan dan prestasi
kaedah pelbagai
imputasi kerana jumlah imputasi meningkat.
Kata kunci: Jumlah
imputasi; kecekapan
relatif; pelbagai imputasi
RUJUKAN
Berglund,
P.A. 2010. An introduction to multiple imputation
of complex sample data using SAS v9.2. In
SAS Global Forum. pp. 1-12.
Enders, C.K. 2010. Applied
Missing Data Analysis. New York: The Guilford Publication.
Graham, J.W. 2012. Missing
Data: Analysis and Design. New York: Springer
Sciences & Business Media.
Graham, J.W., Olchowski, A.E. & Gilreath, T.D. 2007. How many imputations
are really needed? Some practical clarifications
of multiple imputation theory. Prevention Science 8(3):
206-213.
Harel, O.
2007. Inferences on missing information under multiple imputation and
two-stage multiple imputation. Statistical Methodology 4(1):
75-89.
Hershberger, S.L. & Fisher, D.G. 2003. A note on determining the number of imputations for missing data.
Structural Equation Modeling 10(4): 648-650.
Hippel,
P.T. 2005. Teacher’s corner: How many imputations are needed?
A comment on Hershberger and Fisher (2003). Structural Equation
Modelling 12(2): 334-335.
Rubin,
D.B. 1987. Multiple Imputation for
Nonresponse in Surveys. New York: John Wiley & Sons.
Rubin,
D.B. 1976. Inference and missing data.
Biometrika 63(3): 581-592.
Rubin, D.B. & Schenker, N. 1986. Multiple
imputation for interval estimation from
simple random samples with ignorable nonresponse. Journal of
the American Statistical Association 81(394): 366-374.
SAS. 2014. SAS/STAT, Statistical Analysis System for Windows. Relase 9.4. Cary, NC, USA: SAS Institute Inc.
Schafer, J.L. & Olsen, M.K. 1998. Multiple
imputation for multivariate missing-data
problems: A data analyst’s perspective. Multivariate Behavioral
Research 33(4): 545- 571.
Schafer,
J.L. 1997. Analysis of Incomplete Multivariate
Data. New York: Chapman & Hall/CRC.
Toka, O. 2012. Robust estimation in case of missing data. MSc diss., University
of Hacettepe, Ankara, Turkey (Unpublished).
Wayman, J.C.
2003. Multiple imputation for missing data: What
is it and how can I use it? In Annual Meeting
of the American Educational Research Association, Chicago, IL. pp. 2-16.
*Pengarang untuk surat-menyurat; email: gazelser@gmail.com