Sains Malaysiana 44(10)(2015): 1417–1422

 

Outlier Detection using Generalized Linear Model in Malaysian Breast Cancer Data

(Pengesanan Nilai Tersisih menggunakan Model Linear Teritlak dalam Data Kanser Payudara Malaysia)

 

 

M. NAWAMA1, A.I.N. IBRAHIM1*, I.B. MOHAMED1, M.S. YAHYA1 & N.A.M. TAIB2

 

1Institute of Mathematical Sciences, University of Malaya, 59100 Kuala Lumpur, Malaysia

 

2Department of Surgery, University of Malaya Medical Centre, 59100 Kuala Lumpur, Malaysia

 

Received:  22 March 2013/Accepted:  15 June 2015

 

 

ABSTRACT

We consider the problem of outlier detection in bivariate exponential data fitted using the generalized linear model via Bayesian approach. We follow closely the work outlined by Unnikrishnan (2010) and present every step of the detection procedure in details. Due to the complexity of the resulting joint posterior distribution, we obtain the information on the posterior distribution from samples generated by Markov Chain Monte Carlo sampling, in particular, using either the Gibbs sampler or the Metropolis-Hastings algorithm. We use local breast cancer patients’ data to illustrate the implementation of the method.

 

Keywords: Bayesian; Gibbs sampler; Metropolis-Hastings algorithm; Outlier

 

ABSTRAK

Kami mempertimbangkan masalah pengesanan nilai tersisih dalam data bivariat eksponen dengan menggunakan model linear teritlak melalui pendekatan Bayesian. Kami mengikuti secara rapat kajian yang digariskan oleh Unnikrishnan (2010) dan membentangkan setiap langkah prosedur pengesanan secara terperinci. Disebabkan kerumitan taburan posterior tercantum yang terhasil, kami mendapatkan maklumat mengenai taburan posterior tersebut daripada sampel yang dijana oleh pensampelan Markov Chain Monte Carlo, khususnya, menggunakan sama ada kaedah pensampelan Gibbs atau algoritma Metropolis-Hastings yang umum. Kami menggunakan data tempatan iaitu data pesakit kanser payudara untuk menggambarkan pelaksanaan kaedah tersebut.

 

Kata kunci: Algoritma Metropolis-Hastings; Bayesian; kaedah pensampelan Gibbs; nilai tersisih

REFERENCES

Anscombe, F.J. & Guttman, I. 1960. Rejection of outliers. Technometrics 2: 123-147.

Barnett, V. & Lewis T. 1983. Outliers in Statistical Data, Chichester: John Wiley & Sons .

Bayarri, M.J. & Morales, J. 2003 Bayesian measures of surprise for outlier detection. Journal of Statistical Planning and Inference 111: 3-22.

Collet, D. 2003. Modelling Survival Data in Medical Research. Boca Raton, FL: Chapman & Hall / CRC.

Ferguson, T.S. 1961. Rules for rejection of outliers. Review of the International Statistical Institute 29: 29-43.

Freeman, P.R. 1980. On the number of outliers in data from a linear model. In Bayesian Statistics, edited by Bernardo, J.M., DeGroot, M.H., Lindley, D.V. & Smith, A.F.M. pp. 349-65. Valencia: University Press.

Ishwaran, H. 1999. Applications of hybrid Monte Carlo to Bayesian generalized linear models: quasicomplete separation and neural networks. Journal of Computational and Graphical Statistics 8: 779-799.

Kuhnt, S. & Pawlitschko, J. 2003. Outlier Identification Rules for Generalized Linear Models. Technical Report no 12, Department of Statistics, University of Dortmund.

Maller, R.A. & Zhou, S. 1994. Testing for sufficient follow-up and outliers in survival data. Journal of the American Statistical Association 89: 1499-509.

Marshall, E.C. & Spiegelhalter, D.J. 2007. Identifying outliers in Bayesian hierarchical models: A simulation-based approach. Bayesian Analysis 2: 409-444.

Nardi, A. & Schemper, M. 1999. New residuals for Cox regression and their application to outlier screening. Biometrics 55(2): 523-529.

Page, G.L. & Dunson, D.B. 2011. Bayesian local contamination models for multivariate outliers. Technometrics 53: 152-162.

Pettit, L.I. 1994. Bayesian approaches to the detection of outliers in Poisson samples. Communication in Statistics-Theory and Methods 23: 1785-1795.

Taib, N.A., Akmal, M.N., Mohamed, I.B. & Yip, C.H. 2011 Improvement in survival of breast cancer patients trends in survival over two time periods in a single institution in an Asia Pacific Country Malaysia. Asian Pacific J. of Can. Prev. 12: 345-349.

Taib, N.A., Yip, C.H. & Mohamed, I. 2008. Survival analysis of Malaysian women with breast cancer: Results from the University of Malaya Medical Centre. Asian Pacific J. of Can. Prev. 9: 197-202.

Therneau, T.M., Grambcsh, P.M. & Fleming, T.R. 1990. Martingale-based residuals for survival models. Biometrika 77(1): 147-60.

Unnikrishnan, N.K. 2010. Bayesian analysis for outliers in survey sampling. Computational Statist. and Data Analysis 54: 1962-1974.

Williams, A.D. 1987. Generalized linear model diagnostic using the deviance and single case deletions. Appl. Statistics 36: 181-191.

Zeger, L.S. & Karim, M.R. 1991. Generalized linear models with random effects: A Gibbs sampling approach. Journal of the American Statistical Association 86: 79-86.

 

 

 

*Corresponding author; email: adrianaibrahim@um.edu.my

 

 

previous