Sains Malaysiana 51(3)(2022): 911-927

http://doi.org/10.17576/jsm-2022-5103-24

 

Improved Spatial Outlier Detection Method Within a River Network

(Kaedah Pengesanan Pencilan Reruang DiPerbaik dalam Suatu Jaringan Sungai)

 

NUR FATIHAH MOHD ALI1, ROSSITA MOHAMAD YUNUS1,*, IBRAHIM MOHAMED1 & FARIDAH OTHMAN2

 

1Institute of Mathematical Sciences, Faculty of Science, Universiti Malaya, 50603 Kuala Lumpur, Federal Territory, Malaysia

  2Department of Civil Engineering, Faculty of Engineering, Universiti Malaya, 50603 Kuala Lumpur, Federal Territory, Malaysia

 

Received: 1 February 2021/Accepted: 13 August 2021

 

Abstract

A spatial outlier refers to the observation whose non-spatial attribute values are significantly different from those of its neighbors. Such observations can also be found in water quality data at monitoring stations within a river network. However, existing spatial outlier detection procedures based on distance measures such as the Euclidean distance between monitoring stations do not take into account the river network topology. In general, water quality levels in lower streams will be affected by the flow from the upper streams. Similarly, the water quality at some tributaries may have little influence on the other tributaries. Hence, a method for identifying spatial outliers in a river network, taking into account the effect of river flow connectivity on the determination of the neighbors of the monitoring stations, is proposed. While the robust Mahalalobis distance is used in both methods, the proposed method uses river distance instead of the Euclidean distance. The performance of the proposed method is shown to be superior using a synthetic river dataset through simulation. For illustration, we apply the proposed method on the water quality data from Sg. Klang Basin in 2016 provided by the Department of Environment, Malaysia. The finding provides a better identification of the water quality in some stations that significantly differ from their neighbouring stations. Such information is useful for  the authorities in their planning of the environmental monitoring of water quality in the areas.

 

Keywords: Euclidean distance; river distance; robust multivariate; spatial outlier; water quality

 

Abstrak

Reruang terpencil merujuk kepada cerapan dengan nilai atribut reruang berbeza secara signifikan berbanding daripada nilai kejiranannya. Cerapan ini boleh dikesan daripada data kualiti air yang dikumpul di stesen-stesen dalam jaringan sungai. Walau bagaimanapun, kaedah semasa untuk mengenal pasti pencilan reruang menggunakan jarak yang diukur antara stesen seperti jarak Euclidean tidak mengambil kira aspek topologi jaringan sungai. Secara umumnya, aras kualiti air pada hilir jaringan sungai dipengaruhi oleh aliran daripada hulu sungai. Begitu juga, kualiti air pada sesuatu jaringan sungai mungkin mempengaruhi sedikit kualiti air pada jaringan sungai yang berbeza. Kaedah dalam mengenal pasti reruang terpencil dalam jaringan sungai dengan mengambil kira kesan terhadap hubung kait aliran sungai bagi menentukan kejiranan sesebuah stesen dicadangkan. Walaupun penganggar kukuh jarak Mahalanobis digunakan dalam kedua-dua kaedah, tetapi kaedah yang dicadangkan ini menggunakan jarak aliran sungai dan bukannya jarak Euclidean. Berpandukan kaedah simulasi set data sungai sintetik, prestasi kaedah yang diperkenalkan ini terbukti lebih baik. Sebagai ilustrasi, kaedah yang diperkenalkan ini diterapkan pada data kualiti air yang diperoleh daripada Sg. Klang pada tahun 2016 yang disediakan oleh Jabatan Alam Sekitar, Malaysia. Keputusan daripada hasil kajian dapat membantu mengenal pasti kualiti air di beberapa buah stesen yang jauh lebih baik daripada stesen berdekatan. Maklumat ini sangat berguna kepada pihak berwajib dalam merancang pemantauan kualiti air di kawasan sekitarnya.

 

Kata kunci: Jarak aliran sungai; jarak Euclidean; kualiti air; penganggar multivariat; reruang terpencil

 

REFERENCES

Alok Kumar, S. & Lalitha, S. 2018. A novel spatial outlier detection technique. Communications in Statistics-Theory and Methods 47(1): 247-257.

Anselin, L. 1995. Local Indicators of Spatial Association - LISA. Geographical Analysis 27(2): 93-115.

Azimi, A., Bagheri, N., Mostafavi, S.M., Furst, M.A., Hashtarkhani, S., Amin, F.H. & Kiani, B. 2021. Spatial-time analysis of cardiovascular emergency medical requests: Enlightening policy and practice. BMC Public Health 21(1): 1-12.

Baur, C., Denner, S., Wiestler, B., Navab, N. & Albarqouni, S. 2021. Autoencoders for unsupervised anomaly segmentation in brain MR images: A comparative study. Medical Image Analysis 69: 101952.

de Fouquet, C. & Bernard-Michel, C. 2006. Geostatistical models for concentrations or flow rates in streams. Comptes Rendus Geoscience 338(5): 307-318.

Cai, Q., He, H. & Hong Man. 2009. SOMSO: A self-organizing map approach for spatial outlier detection with multiple attributes. In IEEE International Joint Conference on Neural Networks. pp. 425-431.

Chen, D., Lu, C-T., Kou, Y. & Chen, F. 2008. On detecting spatial outliers. Geoinformatica 12(4): 455-475.

Cressie, N.A.C. 1993. Spatial Statistics. New York: John Wiley and Sons. Inc.

Cressie, N., Frey, J., Harch, B. & Smith, M. 2006. Spatial prediction on a river networkJournal of Agricultural, Biological, and Environmental Statistics 11: 127-150.

Ernst, M. & Haesbroeck, G. 2017. Comparison of local outlier detection techniques in spatial multivariate data. Data Mining and Knowledge Discovery 31(2): 371-399.

Fawcett, T. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27(8): 861-874.

Filzmoser, P., Ruiz-Gazen, A. & Thomas-Agnan, C. 2014. Identification of local multivariate outliers. Statistical Papers 55(1): 29-47.

Hasib, N.A. & Othman, Z. 2020. Assessing the relationship between pollution sources and water quality parameters of Sungai Langat Basin using association rule mining. Sains Malaysiana 49(10): 2345-2358.

Haslett, J. 1992. Spatial data analysis-challenges. Journal of the Royal Statistical Society: Series D (The Statistician) 41(3): 271-284.

Ibrahim Mohamed, Faridah Othman, Adriana IN Ibrahim, ME Alaa-Eldin & Rossita M Yunus. 2015. Assessment of water quality parameters using multivariate analysis for Klang River basin, Malaysia. Environmental Monitoring and Assessment 187(1): 4182.

Jat, P. 2017. Geostatistical estimation of water quality using river and flow covariance models. PhD Thesis. The University of North Carolina at Chapel Hill (Unpublished).

Kelleher, C. & Braswell, A. 2021. Introductory overview: Recommendations for approaching scientific visualization with large environmental datasets. Environmental Modelling & Software 143: 105113.

Kou, Y. 2006. Abnormal pattern recognition in spatial data. PhD thesis. Virginia Tech. (Unpublished).

Kou, Y., Lu, C-T. & Chen, D. 2016. Spatial weighted outlier detection. In Proceedings of the 2006 SIAM International Conference on Data Mining. SIAM, 2006. pp. 614-618.

Lachhab, A., Trent, M.M. & Motsko, J. 2021. Multimetric approach in the effects of small impoundments on stream water quality: Case study of Faylor and Walker Lakes on Middle Creek, Snyder County, PA. Water and Environment Journal 35(3): 1007-1017.

Laporan Kualiti Alam Sekeliling. 2019. {Enviro Knowledge Center. Technical report, Department of Environment Malaysia, Nov 2020. https://enviro2.doe.gov.my/ekmc/digital-content/laporan-kualiti-alam-sekeliling-2019/.

Liu, F., Su, W., Zhao, J. & Liang, X. 2017. On-line detection method for outliers of dynamic instability measurement data in geological exploration control process. Sains Malaysiana 46(11): 2205-2213.

Lu, C-T., Chen, D. & Kou, Y. 2003. Algorithms for spatial outlier detection. In Third IEEE International Conference on Data Mining. pp. 597-600.

Mainali, J. & Chang, H. 2021. Environmental and spatial factors affecting surface water quality in a Himalayan watershed, Central Nepal. Environmental and Sustainability Indicators 9: 100096.

Money, E.S., Sackett, D.K., Aday, D.D. & Serre, M.L. 2011. Using river distance and existing hydrography data can improve the geostatistical estimation of fish tissue mercury at unsampled locations. Environmental Science & Technology 45(18): 7746-7753.

Money, E., Carter, G.P. & Serre, M.L. 2009a. Using river distances in the space/time estimation of dissolved oxygen along two impaired river networks in New Jersey. Water Research 43(7): 1948-1958.

Money, E., Carter, G.P. & Serre, M.L. 2009b. Modern space/time geostatistics using river distances: Data integration of turbidity and E. coli measurements to assess fecal contamination along the Raritan River in New Jersey. Environmental Science & Technology 43(10): 3736-3742.

Peters, N.E. & Meybeck, M. 2000. Water quality degradation effects on freshwater availability: impacts of human activities. Water International 25(2): 185-193.

Peiman Asadi, Davison, A.C. & Engelke, S. 2015. Extremes on river networks. The Annals of Applied Statistics 9(4): 2023-2050.

Peter Chu Su. 2011. Statistical geocomputing: Spatial outlier detection in precision agriculture. Master’s thesis. University of Waterloo (Unpublished).

Peterson, E.E. & Urquhart, N.S. 2006. Predicting water quality impaired stream segments using landscape-scale data and a regional geostatistical model: A case study in Maryland. Environmental Monitoring and Assessment 121(1-3): 615-638.

Peterson, E.E., Merton, A.A., Theobald, D.M. & Urquhart, N.S. 2006. Patterns of spatial autocorrelation in stream water chemistry. Environmental Monitoring and Assessment 121(1-3): 571-596.

Rouquette, J.R., Dallimer, M., Armsworth, P.R., Gaston, K.J., Maltby, L. & Warren, P.H. 2013. Species turnover and geographic distance in an urban river network. Diversity and Distributions 19(11): 1429-1439.

Rousseeuw, P.J. & Van Driessen, K. 1999. A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3): 212-223.

Sajesh, T.A. & Srinivasan, M.R. 2013. An overview of multiple outliers in multi-dimensional data. Sri Lankan Journal of Applied Statistics 14(2): 87-120.

Shekhar, S., Lu, C-T. & Zhang, P. 2003. A unified approach to detecting spatial outliers. GeoInformatica 7(2): 139-166.

Talagala, P.D., Hyndman, R.J., Leigh, C., Mengersen, K. & Smith‐Miles, K. 2019. A feature‐based procedure for detecting technical outliers in water‐quality data from in situ sensors. Water Resources Research 55(11): 8547-8568.

Tortorelli, R.L. & Pickup, B.E. 2006. Phosphorus concentrations, loads, and yields in the Illinois river basin, Arkansas and Oklahoma. 2000-2004. Technical report.

Ver Hoef, J.M. & Peterson, E.E. 2010. A moving average approach for spatial statistical models of stream networks. Journal of the American Statistical Association 105(489): 6-18.

Ver Hoef, J.M., Peterson, E., Clifford, D. & Shah, R. 2014. SSN: An R package for spatial statistical modeling on stream networks. Journal of Statistical Software 56(3): 1-45.

Ver Hoef, J.M., Peterson, E. & Theobald, D. 2006. Spatial statistical models that use flow and stream distance. Environmental and Ecological Statistics 13(4): 449-464.

Wang, S. & Serfling, R. 2018. On masking and swamping robustness of leading nonparametric outlier identifiers for multivariate data. Journal of Multivariate Analysis 166: 32-49.

Yang, M., Chen, Z., Zhou, M., Liang, X. & Bai, Z. 2021. The impact of COVID-19 on crime: A spatial temporal analysis in Chicago. ISPRS International Journal of Geo-Information 10(3): 152.

Zheng, G., Brantley, S.L., Lauvaux, T. & Li, Z. 2017. Contextual spatial outlier detection with metric learning. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 2161-2170.

 

*Corresponding author; email: rossita@um.edu.my

 

 

previous