Improving Performance Of Hmm-based Asr For Gsm-efr Speech Coding

Lallouani Bouchakour; Mohamed Debyeche

doi:10.61850/allj.v20i1.499

pdf (العربية)

Published: Jun 28, 2014

DOI: https://doi.org/10.61850/allj.v20i1.499

Keywords:

speech coding, GSM, EFR, CHMM, ASR, ARADIGT, MFCC, bit-strea

Lallouani Bouchakour

Mohamed Debyeche

Abstract

The Global System for Mobile (GSM) environment includes three main problems for Automatic Speech Recognition (ASR) systems: noisy scenarios, source coding distortion and transmission errors. The second, source coding distortion must be explicitly addressed. In this paper, we investigate different features extractions techniques for GSM EFR (Enhanced Full Rate) coding with the aim to improve the performance of ASR in the GSM domain. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech bit-stream instead of decoding it and subsequently extracting the feature vectors. The speaker independent recognition experiment was based on the Continuous Hidden Model Markov (CHMM). The performance of the proposed speech recognition technique was assessed using the ARADIGT transcoding with its 8 kHz downsampled version. Different experiments were carried out in order to explore feature calculation directly from the GSM EFR encoded parameters and to measure the degradation introduced by different aspects of the coder. The ARADIGIT database consists of 60 speakers (31 male speakers and 29 female speakers) pronouncing the ten Arabic digits, was built in order to conduct the necessary experiments. As a result, the proposed methods achieved higher performances in recognition accuracy, compared with the conventional methods employing Mel-Frequency Cepstral Coefficients MFCC. This paper presents two configurations used for extracting feature parameters for speech recognition over mobile communication; the decoded speech-based technique and the bitstream-based technique

Plum Analytics

Artifact Widget

How to Cite

Bouchakour, L., & Debyeche, M. (2014). Improving Performance Of Hmm-based Asr For Gsm-efr Speech Coding. AL-Lisaniyyat, 20(1), 19-26. https://doi.org/10.61850/allj.v20i1.499

Issue

Vol. 20 No. 1 (2014): v20i12014

Section

Articles

In accordance with its open access publishing policy, AL-Lisaniyyat acknowledges and guarantees authors the full and exclusive ownership of copyright and intellectual property rights related to their scholarly contributions.

The publication of an article in the journal does not result in any transfer, assignment, or limitation of these rights. Authors retain full rights over their works, without the requirement to obtain prior written authorization from the journal.

References

[1] Honkanen, T.., Vainoi, J, Jarvinen, Haavisto, P., Salami, R, Laflamme, C. and Adoul, J-P., “Enhanced Full Rate speech code for is-136 digital cellular system,” IEEE. vol.2. pp.731 -734. 1997. [2] Jarvinen, K., Vainio, J, Kapanen, P.. Honkanen, T., Haavisto. P., Salami, R., Lajlamme, C. and Adoul J-P. “GSM En- hanced Full Rate speech codec.” IEEE, Pp771 - 774, 1997 [3] Salami, R, Laflamme, C., Bessette, B. and Adoul, J-P., “Description of GSM
25
26
Lallouani Bouchakour, Mohamed Debyeche.
[4]
[]
Enhanced Full Rate speech codec,” IEEE, Pp. 725- 729. 1997. ‘Antonio, M., Peinado, J. and Segura, C., “Speech recognition over digital channels John Wiley & Sons Ltd, vol. pp 7-29, 2006. Gemot A. Fink “Markov Models for Pat- tem Recognition.” Springer. vol. pp. 61- 92.2008. Zheng-Hua, T. and Lindberg, B, “Auto- matic speech recognition on mobile de- vices and over communication networks,” Springer, vol. pp 41-58, 2008. Sadaoki, F.. “Digital speech processing, synthesis and recognition ” Second Edi- tion, pp 243-328. 2001. Holmes, J. and Holmes, W., “Speech syn- thesis and recognition”, Taylor & Francis e-Library, Second Edition, vol. pp 161- 164, 2003. Fabregas, V., de Alencar, S. and Alcaim, A., “Transformations ofLPC and LSF pa- rameters to speech recognition features,” Springer, vol. pp. 522-528, 2005. [10] Hong, K. K., Seung, H. C. and Hwang S. L. “On Approximating Line Spectral Fre- quencies to LPC Cepstral Coefficients,” IEEE, vol.8, no.2, 2000. [11] Fabregas, V., de Alencar, S. and Alcaim, A… “On the Performance of ITU-T G.723.1 and AMR-NB Codecs for Large Vocabulary Distributed Speech Recogni- tion in Brazilian Portuguese,” IEEE, pp 693-697, 2009.

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)