ACL 2012: 3rd workshop on the People’s Web meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP

Downloads & Info

Introduction 04.07.2012 (681 K) Slides of the workshop introduction

panel discussion 04.07.2012 (641 K) Slides of the workshop panel discussion

CFP-workshop-acl2012 v7 (177 K) Call for Papers

Workshop program


Opening Remarks


Sentiment Analysis Using a Novel Human Computation Game [Slides]
Claudiu Cristian Musat,  Alireza Ghasemi,  Boi Faltings
Ecole Polytechnique Fédérale de Lausanne


A Serious Game for Building a Portuguese Lexical-Semantic Network [Slides]
Mathieu Mangeot1 and Carlos Ramisch2
1LIG-GETALP and Université de Savoie (France), 2LIG-GETALP (France) and UFRGS (Brazil)


Coffee break


Collaboratively Building Language Resources while Localising the Web [Slides]
Asanka Wasala1,  Reinhard Schäler1,  Ruvan Weerasinghe2,  Chris Exton1
1Centre for Next Generation Localisation/Localisation Research Centre, CSIS Dept, University of Limerick, Ireland, 2Language Technology Research Laboratory, University of Colombo School of Computing, 35, Reid Avenue, Colombo 00700, Sri Lanka


Invited talk: Sourcing from the Madding Crowd [Slides]
James Pustejovsky


Lunch break


Resolving Task Specification and Path Inconsistency in Taxonomy Construction
Hui Yang
Georgetown University


EAGER: Extending Automatically Gazetteers for Entity Recognition [Slides]
Omer Farukhan Gunes1,  Tim Furche1,  Christian Schallhart1,  Jens Lehmann2,  Axel-Cyrille Ngonga Ngomo2
1University of Oxford, 2University of Leipzig


Extracting Context-Rich Entailment Rules from Wikipedia Revision History [Slides]
Elena Cabrio1,  Bernardo Magnini2,  Angelina Ivanova3
1INRIA, 2FBK, 3University of Oslo


Coffee break


Panel discussion: Collaboratively Looking Ahead: How to Make Sustainable Goods out of Collaboratively Constructed Semantic Resources?
Ido Dagan [Slides]
, Sandra Kübler, and Simone Paolo Ponzetto [Slides]



Information on registration is provided at the  ACL 2012 website.


Invited Speaker

James Pustejovsky, Brandeis University

Title: Sourcing from the Madding Crowd

Abstract: Classic linguistic datasets used for training machine learning algorithms have tradi-
tionally been constructed through manual annotation efforts, and the most successful ones
have employed a version of the MATTER development cycle (Model, Annotate, Train,
Test, Evaluate, and Revise). MATTER has helped transform the capabilities of natural lan-
guage resources and technologies by integrating the training and testing of data-sensitive
algorithms directly into the development model. The use of crowd sourcing for develop-
ing annotated datasets is both tempting and challenging: while affordable for scaling of
resources, it is difficult or impossible to translate all annotation tasks to HIT-like formats.
    One approach we have developed which seems promising is embedding a crowd sourcing
task as part of a semi-supervised learning strategy for word sense disambiguation. Building
on Rumshisky’s (2009) “Pairwise Similarity”, we use MTurkers to construct soft clusters,
from which we are able to create classifiers for disambiguating word senses. I will also
discuss the use of this kind of clustering to reduce the high dimensionality of annotation
tag values in the development of an annotation specification language, i.e., ISO-Space. The
payoff is similar to that seen in many SSL problems: clustering over the unlabeled data re-
veals features that were not apparent or characterizable by a human model. These are then
used for subsequent human annotation for the construction of ISO-SpaceBank.

Short bio: James Pustejovsky is the TJX/Feldberg Chair in Computer Science at Brandeis Uni-
versity. He is a leading expert on lexical semantics, and also temporal and spatial reasoning,
event semantics, and language annotation. His books include The Generative Lexicon (MIT
1995); with Bran Boguraev, Lexical Semantics: The Problem of Polysemy (OUP 1997);
with Carol Tenny, Events as Grammatical Objects (CSLI 2000); co-author of Interpreting
Motion (with I. Mani) (OUP 2012); co-editor of The Language of Time (OUP 2005); Nat-
ural Language Annotation for Machine Learning (with Amber Stubbs) (O’Reilly 2012);
Generative Lexicon Theory: A Guide (with Elisabetta Jezek) (OUP forthcoming); and Co-
ercion and Compositionality (MIT forthcoming). He was the chief editor of TimeML and
is co-developer of the ISO-Space annotation scheme.



Recent recognition of Collaboratively Constructed Semantic Resources (CSRs) such as Wikipedia [1], Wiktionary [2], Linked Open Data [3], and other resources developed using crowdsourcing such as Games with a Purpose [4] and Mechanical Turk [5] has substantially contributed to the research in natural language processing (NLP).

Researchers started to use such resources to substitute for or supplement conventional lexical semantic resources such as WordNet or linguistically annotated corpora in different NLP tasks. Another research direction is to utilize NLP techniques to enhance the collaboration process and its outcome. This improves the overall quality of the CSRs [6,7]. Overall, the emergence of CSRs has generated new challenges to the research field that are to be addressed in the proposed workshop.

The preceding “The People’s Web meets NLP” workshops at ACL-IJCNLP 2009  and COLING 2010  have successfully gathered researchers from different areas, and enabled an interdisciplinary exchange of research outcomes and ideas. Such collaboration has contributed to the creation of valuable semantic resources and tools based on CSRs, such as word sense alignments between WordNet, Wikipedia, and Wiktionary [8,9,10],  folksonomy and named entity ontologies [11,12], multiword terms [13],  ontological resources [14,15],  annotated corpora [16],  and Wikipedia and Wiktionary APIs. 

The obvious next step in this area is to intensify research that demonstrates the effectiveness of the resources mined from CSRs as listed above in a variety of NLP tasks. This is why the 3rd workshop “The People’s Web meets NLP” will especially welcome submissions that utilize resources and tools for CSRs. We invite both long and short papers and especially encourage to show the benefit of CSRs in diverse NLP tasks, for example word sense disambiguation [17] and semantic role labeling [18], in addition to further exploration of various aspects of CSRs. We also welcome tutorial-like submissions on using the software for CSRs to facilitate their wide adoption by the NLP community.


Specific topics include but are not limited to:

  • Using collaboratively constructed resources and the information mined from them for NLP tasks (cf. Section “References”), such as word sense disambiguation, semantic role labeling, information retrieval, text categorization, information extraction, question answering, etc.;
  • Mining social and collaborative content for constructing structured lexical semantic resources, annotated corpora and the corresponding tools;
  • Analyzing the structure of collaboratively constructed resources related to their use in NLP;
  • Computational linguistics studies of collaboratively constructed resources, such as wiki-based platforms or folksonomies;
  • Structural and semantic interoperability of collaboratively constructed resources with conventional semantic resources and between themselves;
  • Mining multilingual information from collaboratively constructed resources;
  • Using special features of collaboratively constructed resources to create novel resource types, for example revision-based corpora, simplified versions of resources, etc.;
  • Quality and reliability of collaboratively constructed lexical semantic resources and annotated corpora;
  • Hands-on practical knowledge on utilization of CSR APIs and tools or designing crowdsourcing procedures for high quality outcomes.

Though the workshop welcomes any CSRs-related topics, preference will be given to submissions on CSRs’ application to NLP tasks, which is the special interest of this workshop edition. Thereby, we encourage the participation of researchers with various backgrounds: from computational linguistics (e.g. parsing and discourse analysis) to NLP applications and other areas that might benefit from collaboratively constructed semantic resources. Given that we receive a sufficient number of tutorial-like submissions, a dedicated presentation session for those will be scheduled.

Important dates

April  8, 2012


Paper submission deadline (full and short)

May  9, 2012

Notification of acceptance

May 18, 2012

Camera-ready version due

July 13, 2012





Submission Information

Full paper submissions should follow the two-column format of ACL 2012 proceedings without exceeding eight (8) pages of content plus two (2) extra pages for references. Short paper submissions should also follow the two-column format of ACL 2012 proceedings, and should not exceed four (4) pages of content and two (2) additional pages of references. We strongly recommend the use of ACL LaTeX style files or Microsoft Word Style files tailored for this year's conference, which are available on the conference website ( and also in the table below. All submissions must conform to the official ACL 2012 style guidelines announced in the conference website and they must be electronic in PDF.

 ACL 2012 Style Files (direct links to templates files on ACL 2012 conference website)




MS Word




As the reviewing will be blind, the paper must not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ...", must be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ...". Papers that do not conform to these requirements will be rejected without review.  

Submission will be electronic using submission software ( All accepted papers will be presented orally and published in the workshop proceedings.


Iryna Gurevych


Ubiquitous Knowledge Processing Lab, TU Darmstadt

Nicoletta Calzolari Zamorani

Istituto di Linguistica Computazionale, CNR

Jungi Kim

Ubiquitous Knowledge Processing Lab, TU Darmstadt


Program Committee

Andras Csomai

Google Inc.

Andreas Hotho


Julius-Maximilians-Universität Würzburg

Anette Frank

Heidelberg University

Benno Stein

Bauhaus University Weimar

Christian M. Meyer

Technische Universität Darmstadt

David Milne

University of Waikato

Delphine Bernhard

University of Strasbourg

Diana McCarthy

Lexical Computing Ltd, UK

Donald Metzler

Information Sciences Institute, University of Southern California

Emily Pitler

University of Pennsylvania

Ernesto William De Luca

Technische Universität Berlin

Florian Laws

University of Stuttgart

Gerard de Melo

UC Berkeley

German Rigau

University of the Basque Country

Graeme Hirst

University of Toronto

Günter Neumann

DFKI Saarbrücken

Ido Dagan

Bar Ilan University

John McCrae

University of Bielefeld

Jong-Hyeok Lee

Pohang University of Science and Technology

Judith Eckle-Kohler

Technische Universität Darmstadt

Magnus Sahlgren

Swedish Institute of Computer Science

Manfred Stede

Universität Potsdam

Massimo Poesio

University of Essex

Omar Alonso

Microsoft Bing

Paul Buitelaar

DERI, National University of Ireland, Galway

Rene Witte

Concordia University Montréal

Roxana Girju

University of Illinois at Urbana-Champaign

Saif Mohammad

National Research Council Canada

Shuming Shi

Microsoft Research

Sören Auer

Leipzig University

Tat-Seng Chua

National University of Singapore

Tonio Wandmacher

SYSTRAN, Paris, France

Zornitsa Kozareva

Information Sciences Institute, University of Southern California



  1. Olena Medelyan, David Milne, Catherine Legg and Ian H. Witten. Mining meaning from Wikipedia. In: International Journal of Human-Computer Studies. 67(9), 2009.
  2. Torsten Zesch, Christof Müller and Iryna Gurevych. Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In: Proceedings of the Conference on Language Resources and Evaluation, 2008.
  3. Yuan Ni, Lei Zhang, Zhaoming Qiu, and Chen Wang. Enhancing the open-domain classification of named entity using linked open data. In: Proceedings of the 9th international semantic web conference (ISWC'10), 566-581, 2010.
  4. Luis von Ahn and Laura Dabbish. General Techniques for Designing Games with a Purpose. Communications of the ACM, 2008.
  5. Rion Snow, Brendan O’Connor, Daniel Jurafsky and Andrew Y. Ng. Cheap and Fast---But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. Proceedings of EMNLP. 2008.
  6. Rada Mihalcea and Andras Csomai. Wikify!: Linking Documents to Encyclopedic Knowledge. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007.
  7. Daniel S. Weld, Fei Wu, Eytan Adar, Saleema Amershi, James Fogarty, Raphael Hoffmann, Kayur Patel and Michael Skinner. Intelligence in Wikipedia. In: Proceedings of the Twenty-Third Conference on Artificial Intelligence (AAAI-08), 2008.
  8. Elisabeth Niemann and Iryna Gurevych. The People’s Web meets Linguistic Knowledge: Automatic Sense Alignment of Wikipedia and WordNet. In: Proceedings of the International Conference on Computational Semantics (IWCS), pp. 205-214, 2011.
  9. Christian M. Meyer and Iryna Gurevych. What Psycholinguists Know About Chemistry: Aligning Wiktionary and WordNet for Increased Domain Coverage. In: Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), 2011.
  10. Roberto Navigli and Simone Paolo Ponzetto. BabelNet: Building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), 2010.
  11. Noriko Tomuro and Andriy Shepitsen. Construction of Disambiguated Folksonomy Ontologies Using Wikipedia. In: Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, 2009.
  12. Yumi Shibaki, Masaaki Nagata and Kazuhide Yamamoto. Constructing Large-Scale Person Ontology from Wikipedia.  In: Proceedings of the 2nd Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, 2010.
  13. Silvana Hartmann, Gyuri Szarvas and Iryna Gurevych. Mining Multiword Terms from Wikipedia. In M.T. Pazienza & A. Stellato (Eds.): Semi-Automatic Ontology Development: Processes and Resources, 2011.
  14. Christian M. Meyer and Iryna Gurevych. OntoWiktionary — Constructing an Ontology from the Collaborative Online DictionaryWiktionary. In M. T. Pazienza and A. Stellato (Eds.): Semi-Automatic Ontology Development: Processes and Resources, 2011.
  15. Vivi Nastase, Michael Strube, Benjamin Börschinger, Cäcilia Zirn, and Anas Elghafari. WikiNet: A very large scale multi-lingual concept network. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), 2010.
  16. Jon Chamberlain, Udo Kruschwitz and Massimo Poesio. Constructing an Anaphorically Annotated Corpus with Non-Experts: Assessing the Quality of Collaborative Annotations. In: Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, 2009.
  17. Simone Paolo Ponzetto and Roberto Navigli. Knowledge-rich Word Sense Disambiguation rivaling supervised systems. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), 2010.
  18. Ana-Maria Giuglea and Alessandro Moschitti. Semantic role labeling via FrameNet, VerbNet and PropBank. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (ACL), 2006.


For further information about the workshop, please contact Jungi Kim.

A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact
zum Seitenanfangzum Seitenanfang