Formants and Prosodic Features' Effects on Arabic Speaker Identification Accuracy in Noisy Environments

Main Article Content

Khadidja Nesrine Boubakeur
Mohamed Debyeche

Abstract

This study investigates the use of formants and prosodic features,
specifically pitch and intensity, for speaker identification in real
conditions. To enhance the robustness of the acoustic models against
speech signal variations in noisy environments, Mel-Frequency Cepstral
Coefficient (MFCC) are added to these features. A Speaker Identification
system based on Hidden Markov Models (HMM) is implemented in the
independent text mode. The combination of formants and prosodic
features with cepstral features improves the identification accuracy,
particularly in high-noise environments, up to 10%, in comparison to an
MFCC based system. The results show that the use of multivariate feature
vectors significantly improves the performance of an identification
system in the presence of noise compared to an MFCC-based system.

Article Details

How to Cite
Boubakeur, K. N., & Debyeche, M. (2024). Formants and Prosodic Features’ Effects on Arabic Speaker Identification Accuracy in Noisy Environments. AL-Lisaniyyat, 30(2), 40-52. Retrieved from https://crstdla.dz/ojs/index.php/allj/article/view/734
Section
Articles

References

Al-Karawi, K. A., & Mohammed, D. Y. (2021). Improving short utterance speaker
verification by combining MFCC and entropy in noisy conditions. Multimedia Tools
and Applications, 80(14), 22231–22249.
Amrous, A. I., & Debyeche, M. (2012). Robust Arabic multi-stream speech recognition
system in noisy environment. In Image and Signal Processing: 5th International
Conference, ICISP 2012, Agadir, Morocco, June 28–30, 2012. Proceedings 5 (pp. 571–
578). Springer.
Amrous, A. I., Debyeche, M., & Amrouche, A. (2011). Prosodic features and formant
contribution for Arabic speech recognition in noisy environments. In Advances in
Intelligent and Soft Computing (pp. 465–474).
Amrouche, A., Abed, A., & Falek, L. (2019). Arabic speech synthesis system based on
HMM. In 2019 6th International Conference on Electrical and Electronics Engineering
(ICEEE) (pp. 73–78). IEEE.
Arinaitwe, P., Murungi, E., Ogenyi, F. C., Asiimwe, R., & Buhari, M. D. (2024). Review
of techniques used in speech signal processing. Deleted Journal, 3(1), 63–70.
Boersma, P. (2006). Praat: Doing phonetics by computer (version 4.4.24). Retrieved
from http://www.praat.org.
Boubakeur, K. N., Debyeche, M., Amrouche, A., & Bentrcia, Y. (2022). Prosodic
modeling-based speaker identification. In 2022 2nd International Conference on New
Technologies of Information and Communication (NTIC) (pp. 1–6). Mila, Algeria.
Cui, B.-G., & Chen, X. (2010). An improved hidden Markov model for literature
metadata extraction. In Advanced Intelligent Computing Theories and Applications : 6th
International Conference on Intelligent Computing, ICIC 2010, Changsha, China,
August 18–21, 2010. Proceedings 6 (pp. 205–212). Springer.
Doddington, G. R. (1985). Speaker recognition—Identifying people by their voices.
Proceedings of the IEEE, 73(11), 1651–1664.
Droua-Hamdani, G. (2020). Formant frequency analysis of MSA vowels in six Algerian
regions. In Lecture Notes in Computer Science (pp. 128–135).
Falek, L., Amrouche, A., Fergani, L., Teffahi, H., & Djeradi, A. (2011). Formantic
analysis of speech signal by wavelet transform. In 2011 Proceedings of the World
Congress on Engineering, WCE 2011 (Vol. 2, pp. 1572–1576).
Fairclough, L., Brown, G., & Kirchhuebel, C. (2023). Reviewing the performance of
formants for forensic voice comparison: A meta-analysis of forensic speech science
research. In R. Skarnitzl & J. Volín (Eds.), Proceedings of the 20th International
Congress of Phonetic Sciences (pp. 3834–3838).
Khadidja Nesrine Boubakeur, Mohamed Debyeche
AL-LISANIYYAT - Vol. … - N° …
Ferrer, L., Scheffer, N., & Shriberg, E. (2010). A comparison of approaches for
modeling prosodic features in speaker recognition. In 2010 IEEE International
Conference on Acoustics, Speech, and Signal Processing (pp. 4414–4417). IEEE.
Huang, X., Acero, A., Hon, H.-W., & Reddy, R. (2001). Spoken language processing:
A guide to theory, algorithm, and system development. Prentice Hall PTR.
Ji, M., Wang, F., Wan, J. N., & Liu, Y. (2015). Literature review on hidden Markov
model-based sequential data clustering. Applied Mechanics and Materials, 713, 1750–
1756.
Kreiman, J., & Sidtis, D. (2011). Foundations of Voice Studies: An Interdisciplinary
Approach to Voice Production and Perception.
Leu, F.-Y., & Lin, G.-L. (2017). An MFCC-based speaker identification system. In 2017
IEEE 31st International Conference on Advanced Information Networking and
Applications (AINA) (pp. 1055–1062). Taipei, Taiwan.
Mary, L., & Yegnanarayana, B. (2006). Prosodic features for speaker verification. In
Ninth International Conference on Spoken Language Processing.
McDougall, K. (2006). Dynamic features of speech and the characterization of speakers:
Towards a new approach using formant frequencies. International Journal of Speech
Language and the Law, 13(1), 89–126.
Rabiner, L., & Juang, B.-H. (1993). Fundamentals of Speech Recognition. PrenticeHall, Inc.
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification
using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio
Processing, 3(1), 72–83.
Singh, N., & Khan, R. (2015). Extraction and representation of prosodic features for
automatic speaker recognition technology. In Fifth International Conference on AITMC
(AIM-2015), Proceedings of Advanced in Engineering and Technology (pp. 1–7).
McGraw Hill Education.
Singh, N., Khan, R., & Shree, R. (2012). MFCC and prosodic feature extraction
techniques : A comparative study. International Journal of Computer Applications,
54(1).
Tiwari, V. (2010). MFCC and its applications in speaker recognition. International
Journal on Emerging Technologies, 19–22.
Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition : II.
Noisex-92 : A database and an experiment to study the effect of additive noise on speech
recognition systems. Speech Communication, 12(3), 247–251.
Formants and Prosodic Features' Effects on Arabic Speaker Identification Accuracy in Noisy
Environments
Young, S., Odell, J., et al. (2002). The HTK Book Version 3.3. Speech group,
Engineering Department, Cambridge University Press.