Regional Corpus Of Modern Standard Arabic

Main Article Content

Ahmed Abdelali
Jim Cowie

Abstract

Until recently, only two Arabic corpora were commonly available for researchers: the Agence France-Presse (AFP) Arabic newswire from Linguistic Data Consortium (LDC) and the Al-Harm' newspaper collection from the European Language Resources Distribution Agency (ELDA). The availability of a suitable corpus is a key ,for much objective research in language engineering or any other Natural Language-related This paper presents experimental results of comparing corpora. for Modern Standard Arabic IMSA) collected from samples of online published newspapers from different Arabic countries. The results of the experiments show significant differences in vocabulary and styles within different regions. Comprehensives studies of these differences will allow more understanding fOr the language and has implications on different computational and linguistic related research. Developing adequate resources is more crucial than ever to carry this task further

Article Details

How to Cite
Abdelali, A., & Cowie, J. (2011). Regional Corpus Of Modern Standard Arabic. AL-Lisaniyyat, 17(2), 1-10. https://doi.org/10.61850/allj.v17i2.451
Section
Articles

References

|1] Abdelali. A. (2004) Localization in Modern Standard Arabic. Journal of the American
Society for Information Science and technology (JASIST). Volume 55, Number 1, 2004. pp. 23-
28.
[2] Abdelali, A. Cowie, J. Soliman S. H. (2004) Arabic Information Retrieval Perspectives.
Proceedings of JEP-TALN 2004 Arabic Language Processing, Fez 19-22. April 2004.
[3] Al Samarae I. (1981). The historical linguistic evolution, 2nd edition. Dar Al Andalus.
Beirut. Lebanon. (Book in Arabic).
(4] Al-Kharashi, 1. A. and Evans, M. W. (1994) Comparing words, stems. and roots as index
terms in an Arabic information retrieval system. Journal of the American Society for Information
Science (JASIS) 45(8). pp 548-560.
|5] Cavnar, W. B.. and Trenkle, M. J.. (1994) N-Gram-Based Text Categorization. Proceedings
of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval. Las
Vegas, US. pp. 161-175. 1994
l6] Clarkson P.R. and Rosenfeld. R. (1997) Statistical Language Modeling Using the CMUCambridge Toolkit. Proceedings ESCA Eurospeech 1997
[7] Cowie, J.: Ludovik: Y.. and Zachars
lingual documents. Pros al Conference. Venice pp. 209-214.
18] Dunning. T. (1994) Sta ion of Language. Technical report CRL MCCS-94-
273, Computing Research Lab, New Mexico State University, March 1994,
{91 Filali, I1. (2001) Studies on the poem of Mafdi Zakaria El-Kasida magazine, Issue 9. Algeria.
(Document in Arabic).
[10]Goweder, A. and De Roeck, A. (2001) Assessment of a significant Arabic corpus. Presented
at the Arabic NLP Workshop at ACL/EACL 2001, Toulouse, France, 2001.
[11]Grefenstette, G. (1995) Comparing Two Language Identification Schemes.3" International
Conference on Statistical Analysis of Textual Data. Rome, 1995.
|12}Hmeidi, 1., Kanaan, G. and M. Evens (1997) Design and Implementation of Automatic
Indexing for Information Retrieval with Arabic Documents. Journal of the American Socicty for
Information Science, 48/10, pp. 867-881.
{13] Hunston, S. Corpora in applied linguistics Cambridge University Press May 2002.
[14]Kirehhoff, K. (2002) Novel Speech Recognition Models for Arabic. Johns-Hopkins
University Summer Rescarch Workshop 2002. Final Report.
115]Larkey, L. S. and Connell, M. (2002) Arabic Information Retrieval at UMass in TREC-10 In
Voorhees, . & Harman, D.K. (Eds.) The Tenth Text Retrieval Conference, TREC 2001 NIST
Special Publication 500-250, pp. 562-570.
[16]Larkey, L. S., Ballesteros, L., and Connell, M. (2002) Improving Stemming for Arabic Information Retrieval, Proceedings of SIGIR 2002, pp. 275-282
|17]Ludovik, Y., and Zacharski. R. (1999) Multilingual document language recognition. Proceedings of the Machine Translation Summit VII. Singapore. pp. 317-323.
[18]Maamouri, M.. (1998) Arabic Diglossia and its Impact on the Quality of Education in the Arab Region HUMAN DEVELOPMENT: MOVING FORWARD WORKSHOP. Mediterrancan Development Forum. Marrakech, Morocco. September 3 - 6, 1998
[19]Madar Research - In Focust Article (2004) http://www.madarresearch.com/news/newsdetail.aspx?nwsld=6 Retrieved Sept 22, 2004 120|Manning, C. Schütze, H. (1999) Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA. May 1999. ISBN 0-262-133600-1
[2!|MeNamee, P. (2004) Language Identification: A Solved Problem Suitable for Undergraduate Instruction. Proceedings of the 20th Annual Consortium for Computing Sciences in Colleges East (CCSCE-04), pp. 94-101, October 2004.
[22]Meyer, C. F. (2002) English corpus linguistics: an introduction Cambridge University Press July 2002.
123}Moreh, S. (1988) Studies in Modern Arabic Prose and Poetry, Leiden, E.J. Brill, 1988. 124Stetkevych. J. (1970) The Modern Arabic Literary Language Lexical and Stylistic Developments University of Chicago 1970.
[25]Van Mol, Mark (2000). "Exploring annotated Arabic corpora, preliminary results", in Corpora and Natural Language Processing. proceedings of the International Conference on Artificial and Computational Intelligence for Decision, Control and Automation in Engineering and Industrial Applications, Monastir, pp. 94-98.
[26] Worldwide Internet Population. (2002) www.commerce.net/other/research/stats/Ww stats.html Retrieved Sept 14, 2002.
[27]Xu, J. Fraser, A. Weischedel M. R. (2001) TREC 2001 Cross-lingual Retrieval at BBN NIST Text RE-trieval Conference TREC10 Proceedings, Gaithersburg, MD, pp. 68-77.