A Cross-language Information Retrieval System Based On Linguistic And Statistical Approaches

Main Article Content

Nasreddine Semmar
Faiza Elkateb-Gara

Abstract

As the number of non-English documents that are available on the World Wide Web and in corporate repositories increases, the ability to quickly and effectively search and view documents across language boundaries will continue to grow in importance. Cross-language information retrieval techniques allow searchers access to a wider range of material without requiring specialized knowledge of the content or the languages in the database. We present in this paper a cross-language information retrieval system based on a deep linguistic analysis of documents and queries and a statistical model which assigns a weight to each word in the database according to discriminating power. A comparison tool is used to evaluate all possible intersections between queries and documents and order documents by their relevance.

Article Details

How to Cite
Semmar, N., & Elkateb-Gara, F. (2013). A Cross-language Information Retrieval System Based On Linguistic And Statistical Approaches. AL-Lisaniyyat, 19(2), 1-10. https://doi.org/10.61850/allj.v19i2.479
Section
Articles

References

[BESANCON & Al 2003]
R. Besançon, Gaël de Chalendar, Olivier Ferret, Christian Fluhr, Olivier Mesnard and Hubert
Naets, “The LIC2M’s CLEF 2003 system”, In Working Notes for the CLEF 2003 Workshop,
Trondheim, Norway, 21-22 August 2003.
[BUCKWALTER 2002]
T. Buckwalter, “Buckwalter Arabic Morphological Analyzer Version 1.0”, Linguistic Data
Consortium, 2002.
[DEBILI & ZOUARI 1985]
F. Debili and L. Zouari, “Analyse morphologique de l'arabe écrit voyellé ou non fondée sur la
construction automatique d'un dictionnaire arabe”, Cognitiva, Paris, France, 1985.
[DEBILI & Al 1988]
F. Debili, C. Fluhr and P. Radasoa, “About reformulation in full text IRS”, Information
processing and Management, England, 1988.
[GREFENSTETTE 1998]
G. Grefenstette, “Cross-language information retrieval”, Boston: Kluwer Academic
Publishers, 1998.
[MAAMOURI & Al 2004]
M. Maamouri, Ann Bies, Tim Buckwalter and Wigdan Mekki, “The Penn Arabic Treebank:
Building a Large-Scale Annotated Arabic Corpus”, NEMLAR International Conference on
Arabic Language Resources and Tools, Cairo, Egypt, 22-23 September 2004.
[SEMMAR & FLUHR 2004]
N. Semmar and C. Fluhr, "Multilingual Search Engine implementation", Final Technical
report of ALMA project, EURO-MED programme, DG XIII, Commission of the European
Union, Systran, France, July 2004.
[ZOUARI 1989]
L. Zouari, “Construction automatique d'un dictionnaire orienté vers l'analyse morphosyntaxique de l'arabe, écrit voyellé ou non voyellé”, Thèse de doctorat, Université Paris XI,
Paris, France, 1989.