تأثير البواني الصوتية والخصائص النبرية على دقة التعرف على المتكلم العربي في بيئات صاخبة

##plugins.themes.bootstrap3.article.main##

خديجة نسرين بوبكر
محمد دبياش

الملخص

تتناول هذه الدراسة استخدام البواني الصوتية والخصائص النبرية، تحديدًا النغمة والشدة، لتحديد هوية المتحدث في بيئات صاخبة. لتعزيز قوة النماذج الصوتية ضد التغيرات في إشارات الكلام في بيئات صاخبة، تمت إضافة معاملات (MFCC) إلى هذه الميزات. تم انشاء نظام تعرف آلي على المتكلم يعتمد على نماذج ماركوف المخفية (HMM).  أظهر الجمع بين البواني الصوتية والميزات النبرية مع معاملات (MFCC) تحسنًا في دقة التعرف على المتكلم، خاصةً في البيئات ذات الضوضاء العالية، بنسبة تصل إلى 10% مقارنة بالنظام القائم فقط على MFCC. تُظهر النتائج أن استخدام معاملات متعددة يعزز بشكل كبير من أداء نظام التعرف الآلي على المتكلم في وجود الضوضاء، مقارنةً بالنظام القائم على MFCC وحده.

##plugins.themes.bootstrap3.article.details##

كيفية الاقتباس
بوبكرخ. ن., & دبياشم. (2024). تأثير البواني الصوتية والخصائص النبرية على دقة التعرف على المتكلم العربي في بيئات صاخبة. AL-Lisaniyyat, 30(2), 40-52. استرجع في من https://crstdla.dz/ojs/index.php/allj/article/view/734
القسم
Articles

المراجع

Al-Karawi, K. A., & Mohammed, D. Y. (2021). Improving short utterance speaker
verification by combining MFCC and entropy in noisy conditions. Multimedia Tools
and Applications, 80(14), 22231–22249.
Amrous, A. I., & Debyeche, M. (2012). Robust Arabic multi-stream speech recognition
system in noisy environment. In Image and Signal Processing: 5th International
Conference, ICISP 2012, Agadir, Morocco, June 28–30, 2012. Proceedings 5 (pp. 571–
578). Springer.
Amrous, A. I., Debyeche, M., & Amrouche, A. (2011). Prosodic features and formant
contribution for Arabic speech recognition in noisy environments. In Advances in
Intelligent and Soft Computing (pp. 465–474).
Amrouche, A., Abed, A., & Falek, L. (2019). Arabic speech synthesis system based on
HMM. In 2019 6th International Conference on Electrical and Electronics Engineering
(ICEEE) (pp. 73–78). IEEE.
Arinaitwe, P., Murungi, E., Ogenyi, F. C., Asiimwe, R., & Buhari, M. D. (2024). Review
of techniques used in speech signal processing. Deleted Journal, 3(1), 63–70.
Boersma, P. (2006). Praat: Doing phonetics by computer (version 4.4.24). Retrieved
from http://www.praat.org.
Boubakeur, K. N., Debyeche, M., Amrouche, A., & Bentrcia, Y. (2022). Prosodic
modeling-based speaker identification. In 2022 2nd International Conference on New
Technologies of Information and Communication (NTIC) (pp. 1–6). Mila, Algeria.
Cui, B.-G., & Chen, X. (2010). An improved hidden Markov model for literature
metadata extraction. In Advanced Intelligent Computing Theories and Applications : 6th
International Conference on Intelligent Computing, ICIC 2010, Changsha, China,
August 18–21, 2010. Proceedings 6 (pp. 205–212). Springer.
Doddington, G. R. (1985). Speaker recognition—Identifying people by their voices.
Proceedings of the IEEE, 73(11), 1651–1664.
Droua-Hamdani, G. (2020). Formant frequency analysis of MSA vowels in six Algerian
regions. In Lecture Notes in Computer Science (pp. 128–135).
Falek, L., Amrouche, A., Fergani, L., Teffahi, H., & Djeradi, A. (2011). Formantic
analysis of speech signal by wavelet transform. In 2011 Proceedings of the World
Congress on Engineering, WCE 2011 (Vol. 2, pp. 1572–1576).
Fairclough, L., Brown, G., & Kirchhuebel, C. (2023). Reviewing the performance of
formants for forensic voice comparison: A meta-analysis of forensic speech science
research. In R. Skarnitzl & J. Volín (Eds.), Proceedings of the 20th International
Congress of Phonetic Sciences (pp. 3834–3838).
Khadidja Nesrine Boubakeur, Mohamed Debyeche
AL-LISANIYYAT - Vol. … - N° …
Ferrer, L., Scheffer, N., & Shriberg, E. (2010). A comparison of approaches for
modeling prosodic features in speaker recognition. In 2010 IEEE International
Conference on Acoustics, Speech, and Signal Processing (pp. 4414–4417). IEEE.
Huang, X., Acero, A., Hon, H.-W., & Reddy, R. (2001). Spoken language processing:
A guide to theory, algorithm, and system development. Prentice Hall PTR.
Ji, M., Wang, F., Wan, J. N., & Liu, Y. (2015). Literature review on hidden Markov
model-based sequential data clustering. Applied Mechanics and Materials, 713, 1750–
1756.
Kreiman, J., & Sidtis, D. (2011). Foundations of Voice Studies: An Interdisciplinary
Approach to Voice Production and Perception.
Leu, F.-Y., & Lin, G.-L. (2017). An MFCC-based speaker identification system. In 2017
IEEE 31st International Conference on Advanced Information Networking and
Applications (AINA) (pp. 1055–1062). Taipei, Taiwan.
Mary, L., & Yegnanarayana, B. (2006). Prosodic features for speaker verification. In
Ninth International Conference on Spoken Language Processing.
McDougall, K. (2006). Dynamic features of speech and the characterization of speakers:
Towards a new approach using formant frequencies. International Journal of Speech
Language and the Law, 13(1), 89–126.
Rabiner, L., & Juang, B.-H. (1993). Fundamentals of Speech Recognition. PrenticeHall, Inc.
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification
using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio
Processing, 3(1), 72–83.
Singh, N., & Khan, R. (2015). Extraction and representation of prosodic features for
automatic speaker recognition technology. In Fifth International Conference on AITMC
(AIM-2015), Proceedings of Advanced in Engineering and Technology (pp. 1–7).
McGraw Hill Education.
Singh, N., Khan, R., & Shree, R. (2012). MFCC and prosodic feature extraction
techniques : A comparative study. International Journal of Computer Applications,
54(1).
Tiwari, V. (2010). MFCC and its applications in speaker recognition. International
Journal on Emerging Technologies, 19–22.
Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition : II.
Noisex-92 : A database and an experiment to study the effect of additive noise on speech
recognition systems. Speech Communication, 12(3), 247–251.
Formants and Prosodic Features' Effects on Arabic Speaker Identification Accuracy in Noisy
Environments
Young, S., Odell, J., et al. (2002). The HTK Book Version 3.3. Speech group,
Engineering Department, Cambridge University Press.