Sains Malaysiana 49(9)(2020): 2113-2118
http://dx.doi.org/10.17576/jsm-2020-4909-09
Query Translation for Multilingual Content with Semantic
Technique
(Terjemahan Pertanyaan untuk Kandungan Pelbagai Bahasa dengan Teknik Semantik)
NORITA MD NORWAWI*, SUNDRESAN
A/L PERUMAL, EMRAN HUDA & WAKA JENG
Faculty
of Science and Technology, Universiti Sains Islam Malaysia, 71800 Nilai, Negeri Sembilan Darul Khusus, Malaysia
Diserahkan: 23 Januari 2020/Diterima: 1 April 2020
ABSTRACT
Cross-lingual
information retrieval (CLIR) allows user query in a different language from the
language of target resources. Thus, translation is the key element in the query
processing. There are three translation approaches: query, document, or hybrid
query-document. However, query translation is very challenging due to the
polysemy problem. Different linguistic nature of the languages will lead to
ambiguity of meaning subsequently user’s true intention could be
misinterpreted. This paper presents a semantic technique on query translation
for a multilingual knowledge repository to improve the query processing.
Offline translated documents or parallel corpora in English, Arabic, and Malay
language including Jawi text was used as the data. Set of
keywords were constructed preidentified by expert
related to prophetic food. These keywords were annotated with the relevant
Quranic verses, Hadith texts, Manuscript text images and scientific article
determined by expert. The synonym and context-based translation was annotated
together with the specific keyword. A query will do a three-way pattern match
based on the keyword indexing list that link to the relevant documents. A
one-stop knowledge repository on prophetic food was developed as a proof of
concept using sources are from al-Quran, Hadith, classical manuscript, and
scientific articles verified by experts to ensure the content authenticity and
integrity.
Keywords: Cross lingual information
retrieval; one stop knowledge repository; prophetic food; query translation;
semantic technique
ABSTRAK
Dapatan semula maklumat silang bahasa (CLIR) membolehkan pertanyaan pengguna diajukan dalam bahasa yang berbeza daripada bahasa bahan sumber sasaran. Oleh itu, terjemahan menjadi kunci utama dalam pemprosesan pertanyaan. Terdapat 3 jenis pendekatan terjemahan: terjemahan pertanyaan, dokumen atau pertanyaan-dokumen hibrid. Walau bagaimanapun, terjemahan pertanyaan adalah mencabar berpunca daripada masalah polisemi. Gaya linguistik pelbagai bahasa yang berbeza menimbulkan kesamaran makna yang menyebabkan hasrat sebenar pengguna boleh disalah tafsir. Kajian ini membentangkan teknik semantik terjemahan pertanyaan repositori pelbagai bahasa untuk menambahbaik pemprosesan pertanyaan. Dokumen sumber yang diterjemahkan secara manual atau corpora selari dalam Bahasa Inggeris, Arab dan Melayu termasuk teks Jawi digunakan sebagai data kajian. Set kata kunci telah dikenal pasti oleh pakar bidang berkaitan dengan makanan sunnah. Kata kunci ini dianotasikan dengan ayat-ayat Al-Quran teks Hadith, teks dan imej manuskrip dan artikel saintifik yang berkaitan oleh pakar bidang berkenaan. Perkataan sinonim dan terjemahan secara konteks dianotasikan juga kepada kata kunci berkaitan. Setiap pertanyaan akan menggunakan 3 kaedah pemadanan ke atas senarai indeks kata kunci yang akan menghubungkan kepada dokumen yang relevan. Repositori pengetahuan sehenti berkaitan makanan sunnah dibangunkan sebagai bukti konsep menggunakan sumber daripada Al-Quran,
Hadith, manuskrip klasik dan artikel saintifik yang disahkan oleh pakar bidang untuk menjamin kesahihan dan integriti.
Kata kunci: Dapatan semula maklumat silang bahasa; makanan sunnah; repositori pengetahuan sehenti; teknik semantik; terjemahan pertanyaan
RUJUKAN
Abusalah, M., Tait, J. &
Oakes, M. 2005. Literature review of cross-language information retrieval. World
Academy of Science, Engineering and Technology 4: 175-177.
Agbele, K.K., Ayetiran, E.F. & Aruleba, K.D. 2018. Survey on cross-lingual information
retrieval. International Journal of Scientific & Engineering Research 9(8): 484-491.
Aldhlan, K.A., Zeki,
A.M. & Zeki, A.M. 2010. Datamining and Islamic
knowledge extraction: Alhadith as a knowledge
resource. In Proceedings
of the International Conference on Information and Communication Technology for
Muslim World (ICT4M). IEEEE. H-21.
Azad, H.K. & Deepak,
A. 2019. Query expansion techniques for information retrieval: A survey. Information
Processing & Management 56(5):
1698-1735.
Elayeb, B. & Bournas, I. 2016. Arabic cross-language information
retrieval: a review. ACM Transactions on Asian and Low-Resource Language
Information Processing (TALLIP) 15(3):
1-44.
Jena, G.C. & Rautaray, S.S. 2019. A comprehensive survey on
cross-language information retrieval system. Indonesian Journal of
Electrical Engineering and Computer Science 14(1): 127-134.
Norwawi, N.M., Perumal, S., Sempo, M.W., Huda,
E. & Jeng, W. 2019. Multi-lingual content
management system for prophetic food. In Proceedings of
the International Conference on Islamic
Applications in Computer Science and Technologies (IMAN 2019). 27(28).
Prasath, R., Sarkar, S. &
O’Reilly, P. 2015. Improving cross language information retrieval using
corpus based query suggestion approach. In International
Conference on Intelligent Text Processing and Computational Linguistics. Springer, Cham. pp. 448-457.
Sharma, M. & Morwal, S.
2015. A survey on cross-language
information retrieval. International Journal of Advanced Research in
Computer and Communication Engineering 4(2): 384-387.
Tawil,
S.F.M., Ismail, R., Wahid, F.A., Norwawi, N.M. & Mazlan, A.A. 2016. Application of OASys approaches for dates ontology. In Third International Conference on
Information Retrieval and Knowledge Management (CAMP). IEEE. pp. 131-135.
*Pengarang untuk surat-menyurat; email: norita@usim.edu.my
|