An Extensible Schema For Building Large Weakly-labeled Semantic Corpora

S.Matthew English

doi:10.61850/allj.v22i2.368

pdf (English)

Publié-e : mai 30, 2016

DOI : https://doi.org/10.61850/allj.v22i2.368

Mots-clés :

Wikidata - Corpus sémantiques -

S.Matthew English

Université d' Hong Kong

Résumé

En PNL, les données stimulent la recherche, comme en témoigne la fréquence à laquelle des travaux fondateurs d'ingénierie de bases de données tels que The Penn Treebank ont été utilisés comme base d'expérimentation. Traditionnellement, les corpus à grande échelle annotés par des experts sont coûteux et longs à produire. Ce paradigme a poussé les chercheurs à adopter des méthodes automatisées pour générer des données étiquetées avec les outils disponibles tels que Freebase, DBpedia et les « infobox » trouvées sur les pages Wikipédia. Ces bases de connaissances ont été, ou sont en train d'être, intégrées par Wikidata, une initiative visant à concentrer des référentiels de données aussi disparates dans un format organisé lisible par machine. Cette ressource est un outil de recherche important. Dans cet article, nous passons en revue notre expérience d'utilisation de Wikidata dans la construction d'un grand corpus annoté sous supervision à distance. De plus, nous rendons les matériaux, le code utilisé pour générer nos annotations, librement accessibles à toutes les parties intéressées.

Plum Analytics

Artifact Widget

Comment citer

English, S. (2016). Un schéma extensible pour construire de grands corpus sémantiques faiblement étiquetés. AL-Lisaniyyat, 22(2), 18-22. https://doi.org/10.61850/allj.v22i2.368

Numéro

Vol. 22 No 2 (2016): v22i22016

Rubrique

Articles

Conformément à sa politique de publication en libre accès, la revue AL-Lisaniyyat reconnaît et garantit aux auteurs la titularité pleine et exclusive des droits d’auteur ainsi que des droits de propriété intellectuelle afférents à leurs contributions scientifiques.

La publication d’un article dans la revue n’entraîne aucun transfert, cession ou limitation de ces droits. Les auteurs conservent le droit de leurs travaux, sans qu’une autorisation préalable écrite de la revue ne soit requise.

Références

Abad, azad, and alessandro Moschitti.2014.creating a standard for evaluating distant supervision for relation extraction.Italian conference on computational linguistics CLIC-IT.1.
Intxaurrondo, Ander, Mihai surdeanu, oier Loêz de lacalle, and Eneko agirre.2013.Removing noisy mentions for distant suoervision. Procesamiento del lenguaje natural 51.,41-48.
Hoffmann, Raphael, congle zhang, xiao ling, luke Zettlemoyer , and Daniel S.weld. 2011.Knowledge-based weak supervision for information extraction of overlapping relations.
Association for computational linguistics.in proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies-volume 1.541-550.
Manning, Christopher D. and surdeanu, Mihai and bauer, John and Finkel, Jenny and Bethard , steven J.and L+MC Closky , David. 2014. The Stanford core NLP Natural Language Processing Toolkit. Proceedings of 52nd annual meeting of the Association for computational linguistics: System Demonstrations.55-60.
Marcus, Mitchell p.,Mary Ann Marcinkiewicz, and Beatrice Santorini.1993.Building a large annotated corpus of English: the penn Treebank. Cambridge university press, Cambridge, UK.Camputational linguistics 19.2,313-330.
Mintz,Mike , steven Bills, Rion Snow, and dan jurafsky.2009. Distant Supevision for relation extraction without labeled data. Association for computational Linguistics In proceeding of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP.Volume2.
Riedel, Sebastian , Limin Tao , and Andrew Mc callum.2010. Modeling relations and their mentions without labeled text. Springer Berlin Heidelberg.Machine learning and knowledge discovery in databases. 148-163.
Riedel, Sebastian , Limin Tao , and Andrew Mc callum, and Benjamin M.Marlin. 2013. In Naacl-hlt.Linguistic data consortium, Philadephia.74-84.
Sandhaus, Evan. 2008. The new York times annotated corpus.Linguistic Data Consortium, Philadelphia.
Schoenmackers, Stefan, Oren Etzioni, Daniel S.Weld, and jesse davis.2012. Learning first-order horn clauses from web text.Association for computational linguistics.In proceedings of the 2010 conference on empirical methods in natural language processing.1088-1098.
Surdeanu , Mihai , Julie Tibshirani, Ramesh Nallapati, and Christopher D.Manning.2012.Multi-instance multi-label learning for relation extraction.Association for computational linguistics.in proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning .455-465.
Vrandecic, Denny , and Markus Krotzsh.2014.Wikidata : Afree collaborative knowledgebase..comunications of the ACM57,no.10.78-85.
Erxleben, Fredo, Michael Gunther,Markus Krotzsch, Julian Mendez, and Denny Vrandecic.2014.Introduction wikidata to the linked data web.springer international publishing in the semantic web-iswcx2014.50-65.

##plugins.themes.bootstrap3.article.sidebar##

##plugins.themes.bootstrap3.article.main##

Résumé

##plugins.themes.bootstrap3.article.details##

Références