Sains Malaysiana 45(11)(2016): 1755–1761

 

The Performance of Multiple Imputations for Different Number of Imputations

(Prestasi Pelbagai Imputasi untuk Bilangan Imputasi Berlainan)

 

GAZEL SER1*, SIDDIK KESKIN2 & M. CAN YILMAZ1

 

1Department of Animal Science, Biometry and Genetics Unit, Faculty of Agriculture, University of Yuzuncu Yil, 65080 Van, Turkey

 

2Department of Biostatistics, Faculty of Medicine, University of Yuzuncu Yil, 65080 Van

Turkey

 

Diserahkan: 17 September 2015/Diterima: 17 Mac 2016

 

 

ABSTRACT

Multiple imputation method is a widely used method in missing data analysis. The method consists of a three-stage process including imputation, analyzing and pooling. The number of imputations to be selected in the imputation step in the first stage is important. Hence, this study aimed to examine the performance of multiple imputation method at different numbers of imputations. Monotone missing data pattern was created in the study by deleting approximately 24% of the observations from the continuous result variable with complete data. At the first stage of the multiple imputation method, monotone regression imputation at different numbers of imputations (m=3, 5, 10 and 50) was performed. In the second stage, parameter estimations and their standard errors were obtained by applying general linear model to each of the complete data sets obtained. In the final stage, the obtained results were pooled and the effect of the numbers of imputations on parameter estimations and their standard errors were evaluated on the basis of these results. In conclusion, efficiency of parameter estimations at the number of imputation m=50 was determined as about 99%. Hence, at the determined missing observation rate, increase was determined in efficiency and performance of the multiple imputation method as the number of imputations increased.

 

Keywords: Multiple imputation; number of imputations; relative efficiency

 

ABSTRAK

Kaedah pelbagai imputasi adalah suatu kaedah yang digunakan secara meluas dalam menganalisis data yang hilang. Kaedah ini terdiri daripada proses tiga peringkat termasuk imputasi, analisis dan pengumpulan. Bilangan imputasi yang dipilih dalam langkah imputasi pada peringkat pertama adalah penting. Oleh yang demikian, kajian ini bertujuan untuk mengkaji prestasi pelbagai kaedah imputasi pada bilangan imputasi yang berbeza. Corak data hilang monoton telah dibentuk dalam kajian ini dengan menghapuskan kira-kira 24% pemerhatian daripada hasil berterusan pemboleh ubah dengan data yang lengkap. Pada peringkat pertama kaedah pelbagai imputasi, imputasi regresi monoton dalam bilangan imputasi yang berbeza (m=3, 5, 10 dan 50) telah dijalankan. Pada peringkat kedua, penganggar parameter dan ralat piawaian telah diperoleh dengan mengaplikasikan model linear umum kepada setiap set data lengkap yang diperoleh. Pada peringkat akhir, keputusan yang diperoleh telah dikumpulkan dan kesan bilangan imputasi ke atas penganggar parameter dan ralat piawai mereka dinilai berdasarkan keputusan ini. Kesimpulannya, kecekapan penganggar parameter kepada bilangan imputasi m=50 telah ditentukan sebanyak 99%. Oleh itu, pada kadar pemerhatian hilang yang ditentukan, kenaikan telah ditentukan dalam kecekapan dan prestasi kaedah pelbagai imputasi kerana jumlah imputasi meningkat.

 

Kata kunci: Jumlah imputasi; kecekapan relatif; pelbagai imputasi

 

RUJUKAN

Berglund, P.A. 2010. An introduction to multiple imputation of complex sample data using SAS v9.2. In SAS Global Forum. pp. 1-12.

Enders, C.K. 2010. Applied Missing Data Analysis. New York: The Guilford Publication.

Graham, J.W. 2012. Missing Data: Analysis and Design. New York: Springer Sciences & Business Media.

Graham, J.W., Olchowski, A.E. & Gilreath, T.D. 2007. How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science 8(3): 206-213.

Harel, O. 2007. Inferences on missing information under multiple imputation and two-stage multiple imputation. Statistical Methodology 4(1): 75-89.

Hershberger, S.L. & Fisher, D.G. 2003. A note on determining the number of imputations for missing data. Structural Equation Modeling 10(4): 648-650.

Hippel, P.T. 2005. Teacher’s corner: How many imputations are needed? A comment on Hershberger and Fisher (2003). Structural Equation Modelling 12(2): 334-335.

Rubin, D.B. 1987. Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons.

Rubin, D.B. 1976. Inference and missing data. Biometrika 63(3): 581-592.

Rubin, D.B. & Schenker, N. 1986. Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. Journal of the American Statistical Association 81(394): 366-374.

SAS. 2014. SAS/STAT, Statistical Analysis System for Windows. Relase 9.4. Cary, NC, USA: SAS Institute Inc.

Schafer, J.L. & Olsen, M.K. 1998. Multiple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behavioral Research 33(4): 545- 571.

Schafer, J.L. 1997. Analysis of Incomplete Multivariate Data. New York: Chapman & Hall/CRC.

Toka, O. 2012. Robust estimation in case of missing data. MSc diss., University of Hacettepe, Ankara, Turkey (Unpublished).

Wayman, J.C. 2003. Multiple imputation for missing data: What is it and how can I use it? In Annual Meeting of the American Educational Research Association, Chicago, IL. pp. 2-16.

 

 

*Pengarang untuk surat-menyurat; email: gazelser@gmail.com

 

 

 

sebelumnya