Sains Malaysiana 48(12)(2019): 2737–2747
http://dx.doi.org/10.17576/jsm-2019-4812-15
Automatic Speech
Intelligibility Detection for Speakers with Speech Impairments: The
Identification of Significant Speech Features
(Pengesanan Kecerdasan
Pertuturan Automatik untuk Penutur dengan Ketaksempurnaan Pertuturan: Pengenalpastian
Ciri Pertuturan Penting)
FADHILAH ROSDI1*, MUMTAZ BEGUM MUSTAFA2, SITI SALWAH SALIM2 & NOR AZAN MAT ZIN1
1Faculty
of Information Science and Technology, Universiti Kebangsaan Malaysia, 46300 UKM
Bangi, Selangor Darul Ehsan, Malaysia
2Faculty
of Computer Science and Information Technology, University of Malaya, 50603 Kuala
Lumpur, Federal Territory, Malaysia
Diserahkan: 17 Oktober
2018/Diterima: 2 Oktober 2019
ABSTRACT
Selection of relevant features is important for discriminating
speech in detection based ASR system, thus contributing to the
improved performance of the detector. In the context of speech impairments,
speech errors can be discriminated from regular speech by adopting the
appropriate discriminative speech features with high discriminative ability
between the impaired and the control group. However, identification of suitable
discriminative speech features for error detection in impaired speech was not
well investigated in the literature. Characteristics of impaired speech are
grossly different from regular speech, thus making the existing speech features
to be less effective in recognizing the impaired speech. To overcome this gap,
the speech features of impaired speech based on the prosody, pronunciation and
voice quality are analyzed for identifying the significant speech features
which are related to the intelligibility deficits. In this research, we
investigate the relations of speech impairments due to cerebral palsy, and
hearing impairment with the prosody, pronunciation, and voice quality. Later,
we identify the relationship of the speech features with the speech
intelligibility classification and the significant speech features in improving
the discriminative ability of an automatic speech intelligibility detection
system. The findings showed that prosody, pronunciation and voice quality
features are statistically significant speech features for improving the
detection ability of impaired speeches. Voice quality is identified as the best
speech features with more discriminative power in detecting speech
intelligibility of impaired speech.
Keywords: Automatic speech intelligibility detection; speech
detection; speech features; speech impairments
ABSTRAK
Pemilihan ciri yang relevan untuk membezakan pertuturan dalam
sistem ASR berasaskan pengesanan adalah penting kerana
menyumbang kepada peningkatan prestasi pengesan. Dalam konteks ketaksempurnaan
pertuturan, kesalahan pertuturan boleh didiskriminasi daripada pertuturan biasa
dengan menggunakan ciri pertuturan diskriminatif yang bersesuaian dengan
keupayaan diskriminatif yang tinggi antara kumpulan terjejas dan kumpulan
kawalan. Walau bagaimanapun, pengenalpastian ciri pertuturan diskriminatif yang
sesuai untuk pengesanan ralat dalam pertuturan yang terjejas tidak dikaji
dengan baik dalam kajian kepustakawan. Ciri pertuturan yang terjejas adalah
sangat berbeza daripada pertuturan biasa, dengan itu, menjadikan ciri
pertuturan sedia ada kurang berkesan dalam mengenal pasti pertuturan yang
terjejas. Untuk mengatasi jurang ini, ciri pertuturan ketaksempurnaan
pertuturan berdasarkan prosodi, sebutan dan kualiti suara dianalisis untuk
mengenal pasti ciri pertuturan penting yang berkaitan dengan defisit
kecerdasan. Dalam penyelidikan ini, kami mengkaji hubungan antara kecacatan
pertuturan akibat lumpuh otak dan kecacatan pendengaran dengan prosodi, sebutan
dan kualiti suara. Seterusnya, kami mengenal pasti hubungan ciri pertuturan
dengan pengelasan kecerdasan pertuturan dan ciri pertuturan yang penting dalam
meningkatkan keupayaan diskriminatif sistem pengesanan kecerdasan pertuturan
secara automatik. Hasil menunjukkan bahawa ciri prosodi, sebutan dan suara
adalah ciri pertuturan yang signifikan secara statistik untuk meningkatkan
keupayaan pengesanan pertuturan yang terjejas. Kualiti suara dikenal pasti
sebagai ciri pertuturan terbaik dengan kuasa yang lebih diskriminatif dalam
mengesan kecerdasan pertuturan yang terjejas.
Kata kunci: Ciri pertuturan; ketaksempurnaan pertuturan;
pengesanan kecerdasan pertuturan automatik; pengesanan pertuturan
RUJUKAN
Ali Bou Nassif, Ismail Shahin, Imtinan Attili, Mohammad Azzeh
& Khaled Shaalan. 2019. Speech recognition using deep neural networks. A
Systematic Review, IEEE Access 7: 19143-19165.
American Speech and Hearing Association (ASHA). 1993. Dysarthria.
http://www.asha.org/public/speech/impairments/ dysarthria.htm. Accessed on 4th
January 2018.
Bauman-Waengler, J. 2012. Articulatory and Phonological
Impairments: A Clinical Focus. 5th ed. New Jersey: Allyn & Bacon
Communication Sciences and Impairments Series.
Bhushan, C.K. 2016. Speech recognition using artificial neural
network - A Review. Int’l Journal of Computing, Communications &
Instrumentation Engg. (IJCCIE) 3(1) http://dx.doi.org/10.15242/IJCCIE.U0116002.
Blaney, B. & Wilson, J. 2000. Acoustic variability in
dysarthria and computer speech recognition. Clinical Linguistic and Phonetic 14(4): 307-327.
Butt, A.H. 2012. Speech assessment for the classification of
hypokinetic dysarthria in Parkinson’s disease (Masters Dissertation). Computer
Engineering, Dalarna University (Unpublished).
Colton, R.H. & Casper, J.K. 2006. Understanding Voice
Problems: A Physiological Perspective for Diagnosis and Treatment. Baltimore:
Lippincott Williams & Wilkins.
Cutler, A., Dahan, D. & Donselaar, W.v. 1997. Prosody in the
comprehension of spoken language: A literature review. Language and Speech 40:
141-201.
del Hoyo, C. 2012. Design of detectors for automatic speech recognition.
PhD Thesis. Department of Electronics and Telecommunications. Norwegian
University of Science and Technology (Unpublished).
El-Imam, Y.A. & Don, Z.M. 2005. Rules and algorithms for
phonetic transcription of standard Malay. IEICE Transactions on Information
and Systems (10): 2354-2372.
Eyben, F., Wöllmer, M. & Schuller, B. 2010. Opensmile: The
munich versatile and fast open-source audio feature extractor. In Proceedings
of the 18th ACM International Conference on Multimedia 2010: 1459-1462.
Falk, T.H., Chan, W.Y. & Shein, F. 2012. Characterization of
atypical vocal source excitation, temporal dynamics and prosody for objective
measurement of dysarthric word intelligibility. Speech Communication 54(5):
622-631.
Farrús, M., Hernando, J. & Ejarque, P. 2007. Jitter and
shimmer measurements for speaker recognition. Proceedings of the
International Conference Interspeech. August 27-31, Antwerp, Belgium. pp.
778-781.
Fook, C.Y. & Muthusamy, H. 2013. Comparison of speech
parameterization techniques for the classification of speech disfluencies. Turkish
Journal of Electrical Engineering & Computer Sciences. doi:
10.3906/elk-1112-84.
Green, R. 1966. Linguistic subgrouping within Polynesia: The
implications for prehistoric settlement. Journal of the Polynesian Society 75:
6-38.
Haynes, W.O. & Pindzola, R.H. 2012. Motor speech disorders,
dysphagia, and the oral exam. In Diagnosis and Evaluation in Speech
Pathology. 8th ed., edited by Haynes, W.O. & Pindzola, R.H. Upper
Saddle River, New Jersey: Pearson Education Inc. pp. 239-266.
Huang, X., Acero, A. & Hon, H.W. 2001. Spoken Language
Processing: A Guide to Theory, Algorithm and System Development. Upper
Saddle River, New Jersey: Prentice Hall.
John, P.H. 2006. 2006. Hidden Markov Models for Speech Recognition.
Slide presentation, Oregon Health & Science University OGI School of
Science & Engineering. Accessed on 23 November 2017.
http://cc.cpe.ku.ac.th/~jim/lecnotes/markov_models/articles/Yan2003-HMMSpeech
Recognition%20.pdf.
Jurafsky, D. & Martin, J.H. 2009. Speech and Language
Processing: An Introduction to Natural Language Processing, Computational
Linguistics, and Speech Recognition. Upper Saddle River, New Jersey:
Prentice Hall.
Kent, R.D., Weismer, G., Kent, J.F. & Rosenbek, J.C. 1989.
Toward phonetic intelligibility testing in dysarthria. Journal of Speech and
Hearing Impairments 54: 482-499.
Khan, T., Westin, J. & Dougherty, M. 2014. Classification of
speech intelligibility in Parkinson’s Disease. Biocybernetics and Biomedical
Engineering 34: 35-45.
Kim, J., Kumar, N., Tsiartas, A., Li, M. & Narayanan, S.S.
2015. Automatic intelligibility classification of sentence-level pathological
speech. Computer Speech & Language 29(1): 132-144.
Lapteva, O. 2011. Speaker
Perception and Recognition: An Integrative Framework for Computational Speech
Processing. Kassel, Hessen: Kassel University Press.
Michie, D.,
Spiegelhalter, D.J., Taylor, C.C. & Campell, J. 1994. Machine Learning,
Neural, and Statistical Classification. New York: Ellis Horwood.
Nolan, F. 2002. The
‘telephone effect’ on formants: A response. Forensic Linguistics 9(1):
74-82.
Pawley, A. 1966.
Polynesian languages: A subgrouping based on shared innovations in morphology. Journal
of the Polynesian Society 75: 39-64.
Rosell, M. 2006. An
Introduction to Front-End Processing and Acous t ic Features for Automatic
Speech Recognition. http://citeseerx.ist.psu.edu/viewdoc/
summary?doi=10.1.1.120.5299
Rosen, S. & Howell,
P. 2011. Signals and Systems for Speech and Hearing. Leiden,
Netherlands: BRILL.
Tan, T.P., Goh, S.S.
& Khaw, Y.M. 2012. A Malay dialect translation and synthesis system:
Proposal and preliminary system. International Conference on Asian Language
Processing (IALP). Hanoi, Vietnam.
Ting, H.N., Bakar,
A.R.A., Santhosh, J., Al-Zidi, M.G., Ibrahim, I.A. & Cheok, N.S. 2017.
Effects of speech phonological features during passive perception on cortical
auditory evoked potential in sensorineural hearing loss. Sains Malaysiana 46(12):
2477-2488.
Vipperla, R., Renals, S.
& Frankel, J. 2010. Ageing voices: The effect of changes in voice
parameters on ASR performance. EURASIP Journal on Audio, Speech, and Music
Processing 2010: 525783.
Wertzner, H.F.,
Schreiber, S. & Amaro, L. 2005. Analysis of fundamental frequency, jitter,
shimmer and vocal intensity in children with phonological impairments. Brazilian
Journal of Otorhinolaryngology 71(5): 582-588.
Young, V. &
Mihailidis, A. 2010. Difficulties in automatic speech recognition of dysarthric
speakers and implications for speech-based applications used by the elderly: A
literature review. Assistive Technology 22(2): 99-112.
Zhang, Z., Geiger, J.,
Pohjalainen, J., Amr El-Desoky, M., Jin, W. & Schuller, B. 2018. Deep
learning for environmentally robust speech recognition: An overview of recent
developments. ACM Transactions on Intelligent Systems and Technology 9(5):
Article No. 49.
*Pengarang untuk
surat-menyurat; email: fadhilah.rosdi@ukm.edu.my
|