Textual Data Selection For Language Modelling In The Scope Of Automatic Speech Recognition

Main Article Content

Freha Mezzoudj
David Langois
Denis Jouvet

Abstract

The language model is an important module in many applications that produce natural language text, in particular speech recognition. Training of language models requires large amounts of textual data that matches with the target domain. Selection of target domain (or in-domain) data has been investigated in the past. For example [1] has proposed a criterion based on the difference of cross-entropy between models representing in-domain and non-domain-specific data. However evaluations were conducted using only two sources of data, one corresponding to the in-domain, and another one to generic data from which sentences are selected. In the scope of broadcast news and TV shows transcription systems, language models are built by interpolating several language models estimated from various data sources. This paper investigates the data selection process in this context of building interpolated language models for speech transcription. Results show that, in the selection process, the choice of the language models for representing in-domain and non-domain-specific data is critical. Moreover, it is better to apply the data selection only on some selected data sources. This way, the selection process leads to an improvement of 8.3 in terms of perplexity and 0.2% in terms of word-error rate on the French broadcast transcription task.

Article Details

How to Cite
Mezzoudj, F., Langois, D., & Jouvet, D. (2016). Textual Data Selection For Language Modelling In The Scope Of Automatic Speech Recognition. AL-Lisaniyyat, 22(2), 28-33. https://doi.org/10.61850/allj.v22i2.370
Section
Articles

References

P.Swietojanski,A.Ghosal, and S.Renals,Convolutional neural networks for distan speech recognition, IEEE Signal processing letters , vol.21,no.9,pp.1120-1124,sept2014.
N.Jaitly,P.Nguyen,A.SENIOR,and V.Vanhoucke, Application of pretrained deep neural networks to large vocabulary speech recognition,in proceedings of interspeech,2012.
E.S.Ristad and P.N. Yianilos ,Learning string-edit distance,Pattern analysis and machine intelligence ,IEEE Transaction on , vol.20,no.5,pp.5222-532,1998.
A.C.Morris, V. Maier ,and P.Green, From wer and ril to mer and wil: improved evaluation measures for connected speech recognition.in INTERSPEECH,2004.
I.A.McCowan ,D.Moore, J.Dines , D.Gatica-perez,M.Flynn,p.Wellner, and H.Bourlard , On the use of information retrieval measures for speech recognition evaluation, IDIAP, Tech.rep., 2004.
I.A.Mcowan,D.Moore , J.Dines,D.Gatica-perez,M.Flynn,P.Wellner, and H.Bourlard, On the use of information retrieval measures for speech recognition evaluating automated speech recognition devices and the consequences of using probabilistic string edit distance as input , 3rd year project,Sheffield university ,2002.
J.Hoffman ,Papoulis ,a-probability random variables and stochastic processes,1967.
H.Nanjo and T.Kawahara.A new asr evaluation measure and minimum bayes-risk decoding for open-domain speech understanding .in in the proceeding of the IEEE International conference on acoustics, speech, and signal processing ICASSP ,2005,pp.1053-1056.
B.Favre,K.Cheung ,S .Kazemian,A.Lee, Y.Liu, C.Munteanu, A.Nenkova,D.Ochei,G.Penn,S.Tratz et al.,Automatic human utility evaluation of asr systems: does wer really predict performance in Interspeech,2013,pp.3463-3467.
H.Jiang,Confidence measures for speech recognition:A Survey, speech communication,vol 45,no .4pp.466-470,2005.
L.Zhou,Y.Shi,JFeng, and A.Sears,Data mining for detecting erroes in dictation speech recognition,speech and audio processing,IEEE Transactions on ,vol.13,no.5,pp.681-688,2005.
A.Allauzen ,Error detection in confusion network .in INTERSPEECH?2007,pp.1749-1752.
T.Pellegrini and I.Trancoso,ERROR detection in broadcast news asr using markov chains ,in human language technology.Challenges for computer science and linguistics.Springer ,2011,pp.59-69.
W.Chen ,S. Ananthakrishnan ,R.Kumar ,R.Prasad ,and P.Natarajan, ASR error detection in a conversational spoken language translation system ,in the proceedings of the IEEE International conference on acoustics ,speech and signal processing ICASSP. IEEE,2013,pp.7418-7422.
T.Pellegrini and I.Trancoso,Improving asr error detection with nondecoder based features.in Interspeech,2010,pp.1950-1953.
W.A.Ainsworth and S.Pratt,Feedbach strategies for error correction in speech recognition systems,International journal of man-machine studies,vol.36,no.6,pp.833-842,1992.
A.Murray,C.Frankish ,and D.jones , data-entry by voice :Facilitating correction of misrecognitions, in interactive speech technology.Taylor and francis , INc., 1993,pp.137-144.
B.Suhm,B. Myers, and A.Waibel ,Multimodal error correction for speech user interfaces ,ACM transactions on computer-human interaction TOCHI , vol.8,no.1.pp.60-98,2001.
J.Feng and A.Sears, Using confidences scores to improve handsfree speech based navigation in continuous dictation. Systems, ACM transactions on computer-human interaction Tochi ,vol.11,no.4,pp.329-356,2004.
D.YU?M.-y.H wang, P.Mau,A.ACERO, and L.Deng, Unsupervised learning from users error correction in speech dictation .in interspeech,2004.
Y.Shi and L.ZHOU,Supporting dictation supporting dictation speech recognition erroe correction : the impact of external information, Behaviour and information technology,vol.30.no.6,pp.961-774,2011.*
A.Sarma and D.D.Palmer.Context-based speech recognition error detection and correction, in proceedings of HLT-NAACL 2004,pp.85-88.
Y.Bassil and P.Semaan, Asr context-sensitive error correction based on Microsoft n-gram dataset,arxiv preprint arxiv:1203.5262,2012.