Improving Performance Of Hmm-based Asr For Gsm-efr Speech Coding
Main Article Content
Abstract
The Global System for Mobile (GSM) environment includes three main problems for Automatic Speech Recognition (ASR) systems: noisy scenarios, source coding distortion and transmission errors. The second, source coding distortion must be explicitly addressed. In this paper, we investigate different features extractions techniques for GSM EFR (Enhanced Full Rate) coding with the aim to improve the performance of ASR in the GSM domain. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech bit-stream instead of decoding it and subsequently extracting the feature vectors. The speaker independent recognition experiment was based on the Continuous Hidden Model Markov (CHMM). The performance of the proposed speech recognition technique was assessed using the ARADIGT transcoding with its 8 kHz downsampled version. Different experiments were carried out in order to explore feature calculation directly from the GSM EFR encoded parameters and to measure the degradation introduced by different aspects of the coder. The ARADIGIT database consists of 60 speakers (31 male speakers and 29 female speakers) pronouncing the ten Arabic digits, was built in order to conduct the necessary experiments. As a result, the proposed methods achieved higher performances in recognition accuracy, compared with the conventional methods employing Mel-Frequency Cepstral Coefficients MFCC. This paper presents two configurations used for extracting feature parameters for speech recognition over mobile communication; the decoded speech-based technique and the bitstream-based technique
Article Details
References
25
26
Lallouani Bouchakour, Mohamed Debyeche.
[4]
[]
Enhanced Full Rate speech codec,” IEEE, Pp. 725- 729. 1997. ‘Antonio, M., Peinado, J. and Segura, C., “Speech recognition over digital channels John Wiley & Sons Ltd, vol. pp 7-29, 2006. Gemot A. Fink “Markov Models for Pat- tem Recognition.” Springer. vol. pp. 61- 92.2008. Zheng-Hua, T. and Lindberg, B, “Auto- matic speech recognition on mobile de- vices and over communication networks,” Springer, vol. pp 41-58, 2008. Sadaoki, F.. “Digital speech processing, synthesis and recognition ” Second Edi- tion, pp 243-328. 2001. Holmes, J. and Holmes, W., “Speech syn- thesis and recognition”, Taylor & Francis e-Library, Second Edition, vol. pp 161- 164, 2003. Fabregas, V., de Alencar, S. and Alcaim, A., “Transformations ofLPC and LSF pa- rameters to speech recognition features,” Springer, vol. pp. 522-528, 2005. [10] Hong, K. K., Seung, H. C. and Hwang S. L. “On Approximating Line Spectral Fre- quencies to LPC Cepstral Coefficients,” IEEE, vol.8, no.2, 2000. [11] Fabregas, V., de Alencar, S. and Alcaim, A… “On the Performance of ITU-T G.723.1 and AMR-NB Codecs for Large Vocabulary Distributed Speech Recogni- tion in Brazilian Portuguese,” IEEE, pp 693-697, 2009.