Homogeneity Test Based Voice Activity Detection
Main Article Content
Abstract
In this paper a new approach for voice activity detection (VAD) is proposed. This technique is based on homogeneity test of two autoregressive (AR) processes; each one models a speech window and involves the measure of a defined distance. The homogeneity test is formulated as a hypothesis test with a threshold derived analytically according to a userdefined false-alarm probability. Results using Aurora database shows the effectiveness of the proposed technique compared to other methods and standards
Article Details
How to Cite
Rekik, O., & Djeddou, M. (2014). Homogeneity Test Based Voice Activity Detection. AL-Lisaniyyat, 20(1), 77-85. https://doi.org/10.61850/allj.v20i1.506
Section
Articles
References
[1] Ramirez J., Gorriz J. M. and Sergura J. C.“Voice activity detction. Fundamentals and speech recognition system robustness in robust Speech recognition and under- standing”.Intech, June 2007. [2] Moattar M. H. and Homayounpour M. M. “A simple but efficient real-time voice activity detection algorithm” 17th European Signal Processing Conference (EUSIPCO 2009).. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, Pp.68-73. [3] Wang, Y., Huang, S. and Wei Y., “A voice activity detection algorithm with sub-band detection based on time- frequency characteristics of mandariné , 2013 6th Intemational Congress on Image and Signal Processing (CISP 2013). [4] Savoji M. H., “A robust algorithm for accurate endpointing of speechin: Speech Commun.… 1989,vol. 8, pp. 45-60. [5] Wang J. F. and Chen S. H, “A voice activity detection algorithm based on per- ceptual wavelet packet transform and teager energy — operator,’International Symposium on Chinese Spoken Lan- guage Processing.2002, pp. 177-180. [6] Hung W.W. and Wang H.C., “On the use of weighted filter bank analysis for the derivation of robust MFCC,” IEEE Signal Processing Letters, 2001, pp. 70-73. [7] Chang JH, Kim NS. and Mitra SK. “Voice activity detection based on multiHomogeneity Test based Voice Activity Detection
ple statistical models,” IEEE Trans Signal Processing, 2006, pp.1965-1976. [8] Morales-Cordovilla, JA, Ning Ma, Sanchez, V., Carmona, JL., Peinado,
A.M. and Barker, J., “A pitch based noise estimation technique for robust speech recognition with missing data,” Acous- tics, Speech and Signal Processing (ICASSP), 2011 IEEE International Con- ference on IEEE, 2011, pp. 4808—4811. [9] Rissanen J., “Modeling by shortest data description,” Automatica, vol.14, pp. 465—471, 1978. [10] Levinson N., “The Wiener RMS (Root Mean Square) error Criterion in filter de- sign and prediction,” J Math. Phys., vol. 25, 1947, pp. 261-278. [11] Martinez, R., Gomez, P., and Derouiche, K.,A test of homogeneity for auto- regressive processes ‘’, Int. J. Adapt. Control Signal Process. 2002; 16:213- 242. [12] Hirsch H. and Pearce D., “The Aurora experimental framework for the perfor- mance evaluation of speech recognition systems under noisy conditions.” ISCA ITRW ASR 2000, Paris, France, Septem- ber 18-20. [13] Sohn J., Kim N. S. and Sung W., “A
statistical model-based voice activity de- tection,” IEEE Signal Process. Lett., vol. 16, no. 1, pp. 1-3, Jan. 1999. [14] Benyassine A., Shlomot E..Su H.-Y.
Massaloux, D, Lamblin C. and Petit J-P., “Ttu-t recommendation g.729 annex b : A
silence compression scheme for use with g.729 optimized for v.70 digital simulta- neous voice and data applications. IEEE Communications Magazine, 35(9):64-73, September 1997. [15] European Standard (Telecommunications series). Voice activity detector (vad) for adaptive multi-rate (amr) speech trafic channels, 1999. ETSI EN 301 708 v7.11 standard description. [16] European standard (Telecomunications- nseries). Transmission and quality aspect
front-end feature extarction algorithm, 2007. ETSI ES 202050 standard descriprtion.
ple statistical models,” IEEE Trans Signal Processing, 2006, pp.1965-1976. [8] Morales-Cordovilla, JA, Ning Ma, Sanchez, V., Carmona, JL., Peinado,
A.M. and Barker, J., “A pitch based noise estimation technique for robust speech recognition with missing data,” Acous- tics, Speech and Signal Processing (ICASSP), 2011 IEEE International Con- ference on IEEE, 2011, pp. 4808—4811. [9] Rissanen J., “Modeling by shortest data description,” Automatica, vol.14, pp. 465—471, 1978. [10] Levinson N., “The Wiener RMS (Root Mean Square) error Criterion in filter de- sign and prediction,” J Math. Phys., vol. 25, 1947, pp. 261-278. [11] Martinez, R., Gomez, P., and Derouiche, K.,A test of homogeneity for auto- regressive processes ‘’, Int. J. Adapt. Control Signal Process. 2002; 16:213- 242. [12] Hirsch H. and Pearce D., “The Aurora experimental framework for the perfor- mance evaluation of speech recognition systems under noisy conditions.” ISCA ITRW ASR 2000, Paris, France, Septem- ber 18-20. [13] Sohn J., Kim N. S. and Sung W., “A
statistical model-based voice activity de- tection,” IEEE Signal Process. Lett., vol. 16, no. 1, pp. 1-3, Jan. 1999. [14] Benyassine A., Shlomot E..Su H.-Y.
Massaloux, D, Lamblin C. and Petit J-P., “Ttu-t recommendation g.729 annex b : A
silence compression scheme for use with g.729 optimized for v.70 digital simulta- neous voice and data applications. IEEE Communications Magazine, 35(9):64-73, September 1997. [15] European Standard (Telecommunications series). Voice activity detector (vad) for adaptive multi-rate (amr) speech trafic channels, 1999. ETSI EN 301 708 v7.11 standard description. [16] European standard (Telecomunications- nseries). Transmission and quality aspect
front-end feature extarction algorithm, 2007. ETSI ES 202050 standard descriprtion.