Effets des formants et des paramètres prosodiques sur la précision de l'identification du locuteur arabe dans des environnements bruités.

##plugins.themes.bootstrap3.article.main##

Khadidja Nesrine Boubakeur
Mohamed Debyeche

Résumé

Ce travail porte sur l'utilisation des formants et des paramètres prosodiques, notamment
la fréquence fondamentale (Pitch) et l'intensité, pour l'identification du locuteur dans un
environnement bruité. Afin d’améliorer la robustesse des modèles acoustiques face aux
variations du signal de parole dans des environnements bruités, les paramètres cepstraux
de fréquence Mel (MFCC) sont ajoutés à ces caractéristiques. Un système
d'identification automatique du locuteur basé sur des modèles de Markov cachés
(HMM) est mis en œuvre. La combinaison des formants et des paramètres prosodiques
avec les caractéristiques cepstrales permet d'améliorer la robustesse des systèmes
d'identification, en particulier dans des environnements très bruyants, avec une
amélioration de 10 % par rapport à un système basé sur les MFCC. Les résultats
montrent que l'utilisation de vecteurs de caractéristiques multivariées permet
l'amélioration des performances d'un système d'identification en présence de bruit par
rapport à un système basé sur les MFCC

##plugins.themes.bootstrap3.article.details##

Comment citer
Boubakeur, K. N., & Debyeche, M. (2024). Effets des formants et des paramètres prosodiques sur la précision de l’identification du locuteur arabe dans des environnements bruités. AL-Lisaniyyat, 30(2), 40-52. Consulté à l’adresse https://crstdla.dz/ojs/index.php/allj/article/view/734
Rubrique
Articles

Références

Al-Karawi, K. A., & Mohammed, D. Y. (2021). Improving short utterance speaker
verification by combining MFCC and entropy in noisy conditions. Multimedia Tools
and Applications, 80(14), 22231–22249.
Amrous, A. I., & Debyeche, M. (2012). Robust Arabic multi-stream speech recognition
system in noisy environment. In Image and Signal Processing: 5th International
Conference, ICISP 2012, Agadir, Morocco, June 28–30, 2012. Proceedings 5 (pp. 571–
578). Springer.
Amrous, A. I., Debyeche, M., & Amrouche, A. (2011). Prosodic features and formant
contribution for Arabic speech recognition in noisy environments. In Advances in
Intelligent and Soft Computing (pp. 465–474).
Amrouche, A., Abed, A., & Falek, L. (2019). Arabic speech synthesis system based on
HMM. In 2019 6th International Conference on Electrical and Electronics Engineering
(ICEEE) (pp. 73–78). IEEE.
Arinaitwe, P., Murungi, E., Ogenyi, F. C., Asiimwe, R., & Buhari, M. D. (2024). Review
of techniques used in speech signal processing. Deleted Journal, 3(1), 63–70.
Boersma, P. (2006). Praat: Doing phonetics by computer (version 4.4.24). Retrieved
from http://www.praat.org.
Boubakeur, K. N., Debyeche, M., Amrouche, A., & Bentrcia, Y. (2022). Prosodic
modeling-based speaker identification. In 2022 2nd International Conference on New
Technologies of Information and Communication (NTIC) (pp. 1–6). Mila, Algeria.
Cui, B.-G., & Chen, X. (2010). An improved hidden Markov model for literature
metadata extraction. In Advanced Intelligent Computing Theories and Applications : 6th
International Conference on Intelligent Computing, ICIC 2010, Changsha, China,
August 18–21, 2010. Proceedings 6 (pp. 205–212). Springer.
Doddington, G. R. (1985). Speaker recognition—Identifying people by their voices.
Proceedings of the IEEE, 73(11), 1651–1664.
Droua-Hamdani, G. (2020). Formant frequency analysis of MSA vowels in six Algerian
regions. In Lecture Notes in Computer Science (pp. 128–135).
Falek, L., Amrouche, A., Fergani, L., Teffahi, H., & Djeradi, A. (2011). Formantic
analysis of speech signal by wavelet transform. In 2011 Proceedings of the World
Congress on Engineering, WCE 2011 (Vol. 2, pp. 1572–1576).
Fairclough, L., Brown, G., & Kirchhuebel, C. (2023). Reviewing the performance of
formants for forensic voice comparison: A meta-analysis of forensic speech science
research. In R. Skarnitzl & J. Volín (Eds.), Proceedings of the 20th International
Congress of Phonetic Sciences (pp. 3834–3838).
Khadidja Nesrine Boubakeur, Mohamed Debyeche
AL-LISANIYYAT - Vol. … - N° …
Ferrer, L., Scheffer, N., & Shriberg, E. (2010). A comparison of approaches for
modeling prosodic features in speaker recognition. In 2010 IEEE International
Conference on Acoustics, Speech, and Signal Processing (pp. 4414–4417). IEEE.
Huang, X., Acero, A., Hon, H.-W., & Reddy, R. (2001). Spoken language processing:
A guide to theory, algorithm, and system development. Prentice Hall PTR.
Ji, M., Wang, F., Wan, J. N., & Liu, Y. (2015). Literature review on hidden Markov
model-based sequential data clustering. Applied Mechanics and Materials, 713, 1750–
1756.
Kreiman, J., & Sidtis, D. (2011). Foundations of Voice Studies: An Interdisciplinary
Approach to Voice Production and Perception.
Leu, F.-Y., & Lin, G.-L. (2017). An MFCC-based speaker identification system. In 2017
IEEE 31st International Conference on Advanced Information Networking and
Applications (AINA) (pp. 1055–1062). Taipei, Taiwan.
Mary, L., & Yegnanarayana, B. (2006). Prosodic features for speaker verification. In
Ninth International Conference on Spoken Language Processing.
McDougall, K. (2006). Dynamic features of speech and the characterization of speakers:
Towards a new approach using formant frequencies. International Journal of Speech
Language and the Law, 13(1), 89–126.
Rabiner, L., & Juang, B.-H. (1993). Fundamentals of Speech Recognition. PrenticeHall, Inc.
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification
using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio
Processing, 3(1), 72–83.
Singh, N., & Khan, R. (2015). Extraction and representation of prosodic features for
automatic speaker recognition technology. In Fifth International Conference on AITMC
(AIM-2015), Proceedings of Advanced in Engineering and Technology (pp. 1–7).
McGraw Hill Education.
Singh, N., Khan, R., & Shree, R. (2012). MFCC and prosodic feature extraction
techniques : A comparative study. International Journal of Computer Applications,
54(1).
Tiwari, V. (2010). MFCC and its applications in speaker recognition. International
Journal on Emerging Technologies, 19–22.
Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition : II.
Noisex-92 : A database and an experiment to study the effect of additive noise on speech
recognition systems. Speech Communication, 12(3), 247–251.
Formants and Prosodic Features' Effects on Arabic Speaker Identification Accuracy in Noisy
Environments
Young, S., Odell, J., et al. (2002). The HTK Book Version 3.3. Speech group,
Engineering Department, Cambridge University Press.