ACL 2012: 3rd workshop on the People’s Web meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP

Downloads & Info

    Workshop program

    09:15–09:30 Opening Remarks
    09:35–10:05 Sentiment Analysis Using a Novel Human Computation Game [Slides]
    Claudiu Cristian Musat,  Alireza Ghasemi,  Boi Faltings
    Ecole Polytechnique Fédérale de Lausanne
    10:10–10:30 A Serious Game for Building a Portuguese Lexical-Semantic Network [Slides]
    Mathieu Mangeot1 and Carlos Ramisch2
    1LIG-GETALP and Université de Savoie (France), 2LIG-GETALP (France) and UFRGS (Brazil)
    10:30–11:00Coffee break
    11:00–11:20 Collaboratively Building Language Resources while Localising the Web [Slides]
    Asanka Wasala1,  Reinhard Schäler1,  Ruvan Weerasinghe2,  Chris Exton1
    1Centre for Next Generation Localisation/Localisation Research Centre, CSIS Dept, University of Limerick, Ireland, 2Language Technology Research Laboratory, University of Colombo School of Computing, 35, Reid Avenue, Colombo 00700, Sri Lanka
    11:25–12:30 Invited talk: Sourcing from the Madding Crowd [Slides]
    James Pustejovsky
    12:30–14:00 Lunch break
    14:00–14:30 Resolving Task Specification and Path Inconsistency in Taxonomy Construction
    Hui Yang
    Georgetown University
    14:35–15:55 EAGER: Extending Automatically Gazetteers for Entity Recognition [Slides]
    Omer Farukhan Gunes1,  Tim Furche1,  Christian Schallhart1,  Jens Lehmann2,  Axel-Cyrille Ngonga Ngomo2
    1University of Oxford, 2University of Leipzig
    15:00–15:30 Extracting Context-Rich Entailment Rules from Wikipedia Revision History [Slides]
    Elena Cabrio1,  Bernardo Magnini2,  Angelina Ivanova3
    1INRIA, 2FBK, 3University of Oslo
    15:30–16:00 Coffee break
    16:00–17:00 Panel discussion: Collaboratively Looking Ahead: How to Make Sustainable Goods out of Collaboratively Constructed Semantic Resources?
    Ido Dagan [Slides]
    , Sandra Kübler, and Simone Paolo Ponzetto [Slides]



    Information on registration is provided at the  ACL 2012 website.


    Invited Speaker

    James Pustejovsky, Brandeis University

    Title: Sourcing from the Madding Crowd

    Abstract: Classic linguistic datasets used for training machine learning algorithms have tradi-
    tionally been constructed through manual annotation efforts, and the most successful ones
    have employed a version of the MATTER development cycle (Model, Annotate, Train,
    Test, Evaluate, and Revise). MATTER has helped transform the capabilities of natural lan-
    guage resources and technologies by integrating the training and testing of data-sensitive
    algorithms directly into the development model. The use of crowd sourcing for develop-
    ing annotated datasets is both tempting and challenging: while affordable for scaling of
    resources, it is difficult or impossible to translate all annotation tasks to HIT-like formats.
        One approach we have developed which seems promising is embedding a crowd sourcing
    task as part of a semi-supervised learning strategy for word sense disambiguation. Building
    on Rumshisky’s (2009) “Pairwise Similarity”, we use MTurkers to construct soft clusters,
    from which we are able to create classifiers for disambiguating word senses. I will also
    discuss the use of this kind of clustering to reduce the high dimensionality of annotation
    tag values in the development of an annotation specification language, i.e., ISO-Space. The
    payoff is similar to that seen in many SSL problems: clustering over the unlabeled data re-
    veals features that were not apparent or characterizable by a human model. These are then
    used for subsequent human annotation for the construction of ISO-SpaceBank.

    Short bio: James Pustejovsky is the TJX/Feldberg Chair in Computer Science at Brandeis Uni-
    versity. He is a leading expert on lexical semantics, and also temporal and spatial reasoning,
    event semantics, and language annotation. His books include The Generative Lexicon (MIT
    1995); with Bran Boguraev, Lexical Semantics: The Problem of Polysemy (OUP 1997);
    with Carol Tenny, Events as Grammatical Objects (CSLI 2000); co-author of Interpreting
    Motion (with I. Mani) (OUP 2012); co-editor of The Language of Time (OUP 2005); Nat-
    ural Language Annotation for Machine Learning (with Amber Stubbs) (O’Reilly 2012);
    Generative Lexicon Theory: A Guide (with Elisabetta Jezek) (OUP forthcoming); and Co-
    ercion and Compositionality (MIT forthcoming). He was the chief editor of TimeML and
    is co-developer of the ISO-Space annotation scheme.



    Recent recognition of Collaboratively Constructed Semantic Resources (CSRs) such as Wikipedia [1], Wiktionary [2], Linked Open Data [3], and other resources developed using crowdsourcing such as Games with a Purpose [4] and Mechanical Turk [5] has substantially contributed to the research in natural language processing (NLP).

    Researchers started to use such resources to substitute for or supplement conventional lexical semantic resources such as WordNet or linguistically annotated corpora in different NLP tasks. Another research direction is to utilize NLP techniques to enhance the collaboration process and its outcome. This improves the overall quality of the CSRs [6,7]. Overall, the emergence of CSRs has generated new challenges to the research field that are to be addressed in the proposed workshop.

    The preceding “The People’s Web meets NLP” workshops at ACL-IJCNLP 2009  and COLING 2010  have successfully gathered researchers from different areas, and enabled an interdisciplinary exchange of research outcomes and ideas. Such collaboration has contributed to the creation of valuable semantic resources and tools based on CSRs, such as word sense alignments between WordNet, Wikipedia, and Wiktionary [8,9,10],  folksonomy and named entity ontologies [11,12], multiword terms [13],  ontological resources [14,15],  annotated corpora [16],  and Wikipedia and Wiktionary APIs. 

    The obvious next step in this area is to intensify research that demonstrates the effectiveness of the resources mined from CSRs as listed above in a variety of NLP tasks. This is why the 3rd workshop “The People’s Web meets NLP” will especially welcome submissions that utilize resources and tools for CSRs. We invite both long and short papers and especially encourage to show the benefit of CSRs in diverse NLP tasks, for example word sense disambiguation [17] and semantic role labeling [18], in addition to further exploration of various aspects of CSRs. We also welcome tutorial-like submissions on using the software for CSRs to facilitate their wide adoption by the NLP community.


    Specific topics include but are not limited to:

    • Using collaboratively constructed resources and the information mined from them for NLP tasks (cf. Section “References”), such as word sense disambiguation, semantic role labeling, information retrieval, text categorization, information extraction, question answering, etc.;
    • Mining social and collaborative content for constructing structured lexical semantic resources, annotated corpora and the corresponding tools;
    • Analyzing the structure of collaboratively constructed resources related to their use in NLP;
    • Computational linguistics studies of collaboratively constructed resources, such as wiki-based platforms or folksonomies;
    • Structural and semantic interoperability of collaboratively constructed resources with conventional semantic resources and between themselves;
    • Mining multilingual information from collaboratively constructed resources;
    • Using special features of collaboratively constructed resources to create novel resource types, for example revision-based corpora, simplified versions of resources, etc.;
    • Quality and reliability of collaboratively constructed lexical semantic resources and annotated corpora;
    • Hands-on practical knowledge on utilization of CSR APIs and tools or designing crowdsourcing procedures for high quality outcomes.

    Though the workshop welcomes any CSRs-related topics, preference will be given to submissions on CSRs’ application to NLP tasks, which is the special interest of this workshop edition. Thereby, we encourage the participation of researchers with various backgrounds: from computational linguistics (e.g. parsing and discourse analysis) to NLP applications and other areas that might benefit from collaboratively constructed semantic resources. Given that we receive a sufficient number of tutorial-like submissions, a dedicated presentation session for those will be scheduled.

    Important dates

    April  8, 2012    Paper submission deadline (full and short)
    May  9, 2012Notification of acceptance
    May 18, 2012Camera-ready version due
    July 13, 2012Workshop

    Submission Information

    Full paper submissions should follow the two-column format of ACL 2012 proceedings without exceeding eight (8) pages of content plus two (2) extra pages for references. Short paper submissions should also follow the two-column format of ACL 2012 proceedings, and should not exceed four (4) pages of content and two (2) additional pages of references. We strongly recommend the use of ACL LaTeX style files or Microsoft Word Style files tailored for this year's conference, which are available on the conference website ( and also in the table below. All submissions must conform to the official ACL 2012 style guidelines announced in the conference website and they must be electronic in PDF.

     ACL 2012 Style Files (direct links to templates files on ACL 2012 conference website)

    Latex   acl2012.texacl2012.styacl2012.pdfacl.bst 
    MS Word    acl2012.docacl2012.dotacl2012.pdf 


    As the reviewing will be blind, the paper must not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ...", must be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ...". Papers that do not conform to these requirements will be rejected without review.  

    Submission will be electronic using submission software ( All accepted papers will be presented orally and published in the workshop proceedings.


    Iryna Gurevych    Ubiquitous Knowledge Processing Lab, TU Darmstadt
    Nicoletta Calzolari ZamoraniIstituto di Linguistica Computazionale, CNR
    Jungi KimUbiquitous Knowledge Processing Lab, TU Darmstadt


    Program Committee

    Andras CsomaiGoogle Inc.
    Andreas Hotho


    Julius-Maximilians-Universität Würzburg
    Anette FrankHeidelberg University
    Benno SteinBauhaus University Weimar
    Christian M. MeyerTechnische Universität Darmstadt
    David MilneUniversity of Waikato
    Delphine BernhardUniversity of Strasbourg
    Diana McCarthyLexical Computing Ltd, UK
    Donald MetzlerInformation Sciences Institute, University of Southern California
    Emily PitlerUniversity of Pennsylvania
    Ernesto William De LucaTechnische Universität Berlin
    Florian LawsUniversity of Stuttgart
    Gerard de MeloUC Berkeley
    German RigauUniversity of the Basque Country
    Graeme HirstUniversity of Toronto
    Günter NeumannDFKI Saarbrücken
    Ido DaganBar Ilan University
    John McCraeUniversity of Bielefeld
    Jong-Hyeok LeePohang University of Science and Technology
    Judith Eckle-KohlerTechnische Universität Darmstadt
    Magnus SahlgrenSwedish Institute of Computer Science
    Manfred StedeUniversität Potsdam
    Massimo PoesioUniversity of Essex
    Omar AlonsoMicrosoft Bing
    Paul BuitelaarDERI, National University of Ireland, Galway
    Rene WitteConcordia University Montréal
    Roxana GirjuUniversity of Illinois at Urbana-Champaign
    Saif MohammadNational Research Council Canada
    Shuming ShiMicrosoft Research
    Sören AuerLeipzig University
    Tat-Seng ChuaNational University of Singapore
    Tonio WandmacherSYSTRAN, Paris, France
    Zornitsa KozarevaInformation Sciences Institute, University of Southern California



    1. Olena Medelyan, David Milne, Catherine Legg and Ian H. Witten. Mining meaning from Wikipedia. In: International Journal of Human-Computer Studies. 67(9), 2009.
    2. Torsten Zesch, Christof Müller and Iryna Gurevych. Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In: Proceedings of the Conference on Language Resources and Evaluation, 2008.
    3. Yuan Ni, Lei Zhang, Zhaoming Qiu, and Chen Wang. Enhancing the open-domain classification of named entity using linked open data. In: Proceedings of the 9th international semantic web conference (ISWC'10), 566-581, 2010.
    4. Luis von Ahn and Laura Dabbish. General Techniques for Designing Games with a Purpose. Communications of the ACM, 2008.
    5. Rion Snow, Brendan O’Connor, Daniel Jurafsky and Andrew Y. Ng. Cheap and Fast---But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. Proceedings of EMNLP. 2008.
    6. Rada Mihalcea and Andras Csomai. Wikify!: Linking Documents to Encyclopedic Knowledge. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007.
    7. Daniel S. Weld, Fei Wu, Eytan Adar, Saleema Amershi, James Fogarty, Raphael Hoffmann, Kayur Patel and Michael Skinner. Intelligence in Wikipedia. In: Proceedings of the Twenty-Third Conference on Artificial Intelligence (AAAI-08), 2008.
    8. Elisabeth Niemann and Iryna Gurevych. The People’s Web meets Linguistic Knowledge: Automatic Sense Alignment of Wikipedia and WordNet. In: Proceedings of the International Conference on Computational Semantics (IWCS), pp. 205-214, 2011.
    9. Christian M. Meyer and Iryna Gurevych. What Psycholinguists Know About Chemistry: Aligning Wiktionary and WordNet for Increased Domain Coverage. In: Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), 2011.
    10. Roberto Navigli and Simone Paolo Ponzetto. BabelNet: Building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), 2010.
    11. Noriko Tomuro and Andriy Shepitsen. Construction of Disambiguated Folksonomy Ontologies Using Wikipedia. In: Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, 2009.
    12. Yumi Shibaki, Masaaki Nagata and Kazuhide Yamamoto. Constructing Large-Scale Person Ontology from Wikipedia.  In: Proceedings of the 2nd Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, 2010.
    13. Silvana Hartmann, Gyuri Szarvas and Iryna Gurevych. Mining Multiword Terms from Wikipedia. In M.T. Pazienza & A. Stellato (Eds.): Semi-Automatic Ontology Development: Processes and Resources, 2011.
    14. Christian M. Meyer and Iryna Gurevych. OntoWiktionary — Constructing an Ontology from the Collaborative Online DictionaryWiktionary. In M. T. Pazienza and A. Stellato (Eds.): Semi-Automatic Ontology Development: Processes and Resources, 2011.
    15. Vivi Nastase, Michael Strube, Benjamin Börschinger, Cäcilia Zirn, and Anas Elghafari. WikiNet: A very large scale multi-lingual concept network. In: Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), 2010.
    16. Jon Chamberlain, Udo Kruschwitz and Massimo Poesio. Constructing an Anaphorically Annotated Corpus with Non-Experts: Assessing the Quality of Collaborative Annotations. In: Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, 2009.
    17. Simone Paolo Ponzetto and Roberto Navigli. Knowledge-rich Word Sense Disambiguation rivaling supervised systems. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), 2010.
    18. Ana-Maria Giuglea and Alessandro Moschitti. Semantic role labeling via FrameNet, VerbNet and PropBank. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (ACL), 2006.


    For further information about the workshop, please contact Jungi Kim.

    A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
    zum Seitenanfangzum Seitenanfang