Sains Malaysiana 51(3)(2022): 911-927
http://doi.org/10.17576/jsm-2022-5103-24
Improved
Spatial Outlier Detection Method Within a River Network
(Kaedah Pengesanan Pencilan Reruang DiPerbaik dalam Suatu Jaringan Sungai)
NUR FATIHAH MOHD ALI1,
ROSSITA MOHAMAD YUNUS1,*, IBRAHIM MOHAMED1 & FARIDAH OTHMAN2
1Institute
of Mathematical Sciences, Faculty of Science, Universiti Malaya, 50603 Kuala Lumpur, Federal Territory, Malaysia
2Department
of Civil Engineering, Faculty of Engineering, Universiti Malaya, 50603 Kuala Lumpur, Federal Territory, Malaysia
Received: 1 February 2021/Accepted: 13 August 2021
Abstract
A spatial outlier refers to the
observation whose non-spatial attribute values are significantly different from
those of its neighbors. Such observations can also be found in water quality
data at monitoring stations within a river network. However, existing spatial
outlier detection procedures based on distance measures such as the Euclidean
distance between monitoring stations do not take into account the river network
topology. In general, water quality levels in lower streams will be affected by
the flow from the upper streams. Similarly, the water quality at some
tributaries may have little influence on the other tributaries. Hence, a method
for identifying spatial outliers in a river network, taking into account the
effect of river flow connectivity on the determination of the neighbors of the
monitoring stations, is proposed. While the robust Mahalalobis distance is used in both methods, the proposed method uses river distance
instead of the Euclidean distance. The performance of the proposed method is shown to be superior using a
synthetic river dataset through simulation. For illustration, we apply the
proposed method on the water quality data from Sg. Klang Basin in 2016 provided by the Department of Environment, Malaysia. The finding
provides a better identification of the water quality in some stations that
significantly differ from their neighbouring stations. Such information is useful for the authorities in their planning of
the environmental monitoring of water quality in the areas.
Keywords: Euclidean distance; river
distance; robust multivariate; spatial outlier; water quality
Abstrak
Reruang terpencil merujuk kepada cerapan dengan nilai atribut reruang berbeza secara signifikan berbanding daripada nilai kejiranannya. Cerapan ini boleh dikesan daripada data kualiti air
yang dikumpul di stesen-stesen dalam jaringan sungai. Walau bagaimanapun, kaedah semasa untuk mengenal pasti pencilan reruang menggunakan jarak yang diukur antara stesen seperti jarak Euclidean tidak mengambil kira aspek topologi jaringan sungai. Secara umumnya, aras kualiti air pada hilir jaringan sungai dipengaruhi oleh aliran daripada hulu sungai. Begitu juga, kualiti air pada sesuatu jaringan sungai mungkin mempengaruhi sedikit kualiti air pada jaringan sungai yang berbeza. Kaedah dalam mengenal pasti reruang terpencil dalam jaringan sungai dengan mengambil kira kesan terhadap hubung kait aliran sungai bagi menentukan kejiranan sesebuah stesen dicadangkan. Walaupun penganggar kukuh jarak Mahalanobis digunakan dalam kedua-dua kaedah, tetapi kaedah yang dicadangkan ini menggunakan jarak aliran sungai dan bukannya jarak Euclidean. Berpandukan kaedah simulasi set data sungai sintetik, prestasi kaedah yang diperkenalkan ini terbukti lebih baik. Sebagai ilustrasi, kaedah yang diperkenalkan ini diterapkan pada data kualiti air
yang diperoleh daripada Sg. Klang pada tahun 2016 yang disediakan oleh Jabatan Alam Sekitar, Malaysia.
Keputusan daripada hasil kajian dapat membantu mengenal pasti kualiti air di beberapa buah stesen yang jauh lebih baik daripada stesen berdekatan. Maklumat ini sangat berguna kepada pihak berwajib dalam merancang pemantauan kualiti air di kawasan sekitarnya.
Kata kunci: Jarak aliran sungai; jarak Euclidean; kualiti air; penganggar multivariat; reruang terpencil
REFERENCES
Alok Kumar, S. & Lalitha, S. 2018. A novel spatial outlier
detection technique. Communications in
Statistics-Theory and Methods 47(1): 247-257.
Anselin,
L. 1995. Local Indicators of Spatial Association - LISA. Geographical Analysis 27(2): 93-115.
Azimi, A., Bagheri, N., Mostafavi,
S.M., Furst, M.A., Hashtarkhani,
S., Amin, F.H. & Kiani, B. 2021. Spatial-time
analysis of cardiovascular emergency medical requests: Enlightening policy and
practice. BMC Public Health 21(1):
1-12.
Baur,
C., Denner, S., Wiestler,
B., Navab, N. & Albarqouni,
S. 2021. Autoencoders for unsupervised anomaly segmentation in brain MR images:
A comparative study. Medical Image
Analysis 69: 101952.
de
Fouquet, C. & Bernard-Michel, C. 2006. Geostatistical models for
concentrations or flow rates in streams. Comptes Rendus Geoscience 338(5): 307-318.
Cai,
Q., He, H. & Hong Man. 2009. SOMSO: A self-organizing map approach for
spatial outlier detection with multiple attributes. In IEEE International Joint
Conference on Neural Networks. pp. 425-431.
Chen,
D., Lu, C-T., Kou, Y. & Chen, F. 2008. On detecting spatial outliers. Geoinformatica 12(4): 455-475.
Cressie,
N.A.C. 1993. Spatial Statistics. New
York: John Wiley and Sons. Inc.
Cressie, N., Frey, J., Harch, B. &
Smith, M. 2006. Spatial prediction on a river network. Journal of
Agricultural, Biological, and Environmental Statistics 11: 127-150.
Ernst,
M. & Haesbroeck, G. 2017. Comparison of local
outlier detection techniques in spatial multivariate data. Data Mining and Knowledge Discovery 31(2):
371-399.
Fawcett,
T. 2006. An introduction to ROC analysis. Pattern
Recognition Letters 27(8): 861-874.
Filzmoser,
P., Ruiz-Gazen, A. & Thomas-Agnan,
C. 2014. Identification of local multivariate outliers. Statistical Papers 55(1): 29-47.
Hasib,
N.A. & Othman, Z. 2020. Assessing the relationship between pollution
sources and water quality parameters of Sungai Langat Basin using association
rule mining. Sains Malaysiana 49(10):
2345-2358.
Haslett,
J. 1992. Spatial data analysis-challenges. Journal
of the Royal Statistical Society: Series D (The Statistician) 41(3):
271-284.
Ibrahim Mohamed, Faridah Othman, Adriana
IN Ibrahim, ME Alaa-Eldin & Rossita M Yunus. 2015. Assessment of water quality parameters
using multivariate analysis for Klang River basin,
Malaysia. Environmental Monitoring and
Assessment 187(1): 4182.
Jat,
P. 2017. Geostatistical estimation of water quality using river and flow
covariance models. PhD Thesis. The University of North Carolina at Chapel Hill
(Unpublished).
Kelleher,
C. & Braswell, A. 2021. Introductory overview: Recommendations for
approaching scientific visualization with large environmental datasets. Environmental Modelling & Software 143: 105113.
Kou,
Y. 2006. Abnormal pattern recognition in spatial data. PhD thesis. Virginia
Tech. (Unpublished).
Kou,
Y., Lu, C-T. & Chen, D. 2016. Spatial weighted outlier detection. In Proceedings of the 2006 SIAM
International Conference on Data Mining. SIAM, 2006. pp. 614-618.
Lachhab,
A., Trent, M.M. & Motsko, J. 2021. Multimetric approach in the effects of small impoundments
on stream water quality: Case study of Faylor and
Walker Lakes on Middle Creek, Snyder County, PA. Water and Environment Journal 35(3): 1007-1017.
Laporan Kualiti Alam Sekeliling. 2019. {Enviro Knowledge Center. Technical
report, Department of Environment Malaysia, Nov 2020. https://enviro2.doe.gov.my/ekmc/digital-content/laporan-kualiti-alam-sekeliling-2019/.
Liu,
F., Su, W., Zhao, J. & Liang, X. 2017. On-line detection method for
outliers of dynamic instability measurement data in geological exploration
control process. Sains Malaysiana 46(11):
2205-2213.
Lu, C-T., Chen, D. & Kou, Y. 2003. Algorithms for spatial
outlier detection. In Third IEEE International Conference on Data
Mining. pp. 597-600.
Mainali,
J. & Chang, H. 2021. Environmental and spatial factors affecting surface
water quality in a Himalayan watershed, Central Nepal. Environmental and Sustainability Indicators 9:
100096.
Money, E.S., Sackett, D.K., Aday, D.D.
& Serre, M.L. 2011. Using river distance and existing hydrography data can
improve the geostatistical estimation of fish tissue mercury at unsampled
locations. Environmental Science &
Technology 45(18): 7746-7753.
Money, E., Carter, G.P. & Serre, M.L. 2009a. Using river
distances in the space/time estimation of dissolved oxygen along two impaired
river networks in New Jersey. Water
Research 43(7): 1948-1958.
Money, E., Carter, G.P. & Serre, M.L. 2009b. Modern space/time geostatistics using river distances: Data integration
of turbidity and E. coli measurements
to assess fecal contamination along the Raritan River in New Jersey. Environmental Science & Technology 43(10): 3736-3742.
Peters,
N.E. & Meybeck, M. 2000. Water quality
degradation effects on freshwater availability: impacts of human activities. Water International 25(2): 185-193.
Peiman Asadi, Davison, A.C. & Engelke,
S. 2015. Extremes on river networks. The
Annals of Applied Statistics 9(4): 2023-2050.
Peter
Chu Su. 2011. Statistical geocomputing: Spatial
outlier detection in precision agriculture. Master’s thesis. University of
Waterloo (Unpublished).
Peterson, E.E. & Urquhart, N.S. 2006. Predicting water quality
impaired stream segments using landscape-scale data and a regional
geostatistical model: A case study in Maryland. Environmental Monitoring and Assessment 121(1-3): 615-638.
Peterson, E.E., Merton, A.A., Theobald, D.M. & Urquhart, N.S.
2006. Patterns of spatial autocorrelation in stream water chemistry. Environmental Monitoring and Assessment 121(1-3): 571-596.
Rouquette, J.R., Dallimer, M., Armsworth, P.R., Gaston, K.J., Maltby, L. & Warren,
P.H. 2013. Species turnover and geographic distance in an urban river network. Diversity and Distributions 19(11):
1429-1439.
Rousseeuw,
P.J. & Van Driessen, K. 1999. A fast algorithm for the minimum covariance
determinant estimator. Technometrics 41(3): 212-223.
Sajesh,
T.A. & Srinivasan, M.R. 2013. An overview of multiple outliers in
multi-dimensional data. Sri Lankan
Journal of Applied Statistics 14(2): 87-120.
Shekhar,
S., Lu, C-T. & Zhang, P. 2003. A unified approach to detecting spatial
outliers. GeoInformatica 7(2): 139-166.
Talagala,
P.D., Hyndman, R.J., Leigh, C., Mengersen, K. &
Smith‐Miles, K. 2019. A feature‐based procedure for detecting
technical outliers in water‐quality data from in situ sensors. Water
Resources Research 55(11): 8547-8568.
Tortorelli,
R.L. & Pickup, B.E. 2006. Phosphorus concentrations, loads, and yields in
the Illinois river basin, Arkansas and Oklahoma. 2000-2004. Technical report.
Ver Hoef, J.M. & Peterson, E.E.
2010. A moving average approach for spatial statistical models of stream
networks. Journal of the American
Statistical Association 105(489): 6-18.
Ver Hoef, J.M., Peterson, E., Clifford, D. & Shah, R. 2014. SSN: An R
package for spatial statistical modeling on stream networks. Journal of Statistical Software 56(3):
1-45.
Ver Hoef, J.M., Peterson, E. &
Theobald, D. 2006. Spatial statistical models that use flow and stream
distance. Environmental and Ecological
Statistics 13(4): 449-464.
Wang,
S. & Serfling, R. 2018. On masking and swamping
robustness of leading nonparametric outlier identifiers for multivariate data. Journal of Multivariate Analysis 166:
32-49.
Yang,
M., Chen, Z., Zhou, M., Liang, X. & Bai, Z. 2021. The impact of COVID-19 on
crime: A spatial temporal analysis in Chicago. ISPRS International Journal of Geo-Information 10(3): 152.
Zheng, G., Brantley, S.L., Lauvaux, T. & Li, Z. 2017. Contextual spatial outlier
detection with metric learning. In Proceedings
of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining. pp. 2161-2170.
*Corresponding author; email: rossita@um.edu.my
|