Jeju, Republic of Korea
July 13, 2012
http://www.ukp.tu-darmstadt.de/scientific-community/acl-2012-workshop
Introduction 04.07.2012 (681 K) Slides of the workshop introduction
panel discussion 04.07.2012 (641 K) Slides of the workshop panel discussion
CFP-workshop-acl2012 v7 (177 K) Call for Papers
09:15–09:30 | Opening Remarks |
09:35–10:05 | Sentiment Analysis Using a Novel Human Computation Game [Slides] |
10:10–10:30 | A Serious Game for Building a Portuguese Lexical-Semantic Network [Slides] |
10:30–11:00 | Coffee break |
11:00–11:20 | Collaboratively Building Language Resources while Localising the Web [Slides] |
11:25–12:30 | Invited talk: Sourcing from the Madding Crowd [Slides] |
12:30–14:00 | Lunch break |
14:00–14:30 | Resolving Task Specification and Path Inconsistency in Taxonomy Construction |
14:35–15:55 | EAGER: Extending Automatically Gazetteers for Entity Recognition [Slides] |
15:00–15:30 | Extracting Context-Rich Entailment Rules from Wikipedia Revision History [Slides] |
15:30–16:00 | Coffee break |
16:00–17:00 | Panel discussion: Collaboratively Looking Ahead: How to Make Sustainable Goods out of Collaboratively Constructed Semantic Resources? |
James Pustejovsky, Brandeis University
Title: Sourcing from the Madding Crowd
Abstract: Classic linguistic datasets used for training machine learning algorithms have tradi-
tionally been constructed through manual annotation efforts, and the most successful ones
have employed a version of the MATTER development cycle (Model, Annotate, Train,
Test, Evaluate, and Revise). MATTER has helped transform the capabilities of natural lan-
guage resources and technologies by integrating the training and testing of data-sensitive
algorithms directly into the development model. The use of crowd sourcing for develop-
ing annotated datasets is both tempting and challenging: while affordable for scaling of
resources, it is difficult or impossible to translate all annotation tasks to HIT-like formats.
One approach we have developed which seems promising is embedding a crowd sourcing
task as part of a semi-supervised learning strategy for word sense disambiguation. Building
on Rumshisky’s (2009) “Pairwise Similarity”, we use MTurkers to construct soft clusters,
from which we are able to create classifiers for disambiguating word senses. I will also
discuss the use of this kind of clustering to reduce the high dimensionality of annotation
tag values in the development of an annotation specification language, i.e., ISO-Space. The
payoff is similar to that seen in many SSL problems: clustering over the unlabeled data re-
veals features that were not apparent or characterizable by a human model. These are then
used for subsequent human annotation for the construction of ISO-SpaceBank.
Short bio: James Pustejovsky is the TJX/Feldberg Chair in Computer Science at Brandeis Uni-
versity. He is a leading expert on lexical semantics, and also temporal and spatial reasoning,
event semantics, and language annotation. His books include The Generative Lexicon (MIT
1995); with Bran Boguraev, Lexical Semantics: The Problem of Polysemy (OUP 1997);
with Carol Tenny, Events as Grammatical Objects (CSLI 2000); co-author of Interpreting
Motion (with I. Mani) (OUP 2012); co-editor of The Language of Time (OUP 2005); Nat-
ural Language Annotation for Machine Learning (with Amber Stubbs) (O’Reilly 2012);
Generative Lexicon Theory: A Guide (with Elisabetta Jezek) (OUP forthcoming); and Co-
ercion and Compositionality (MIT forthcoming). He was the chief editor of TimeML and
is co-developer of the ISO-Space annotation scheme.
Recent recognition of Collaboratively Constructed Semantic Resources (CSRs) such as Wikipedia [1], Wiktionary [2], Linked Open Data [3], and other resources developed using crowdsourcing such as Games with a Purpose [4] and Mechanical Turk [5] has substantially contributed to the research in natural language processing (NLP).
Researchers started to use such resources to substitute for or supplement conventional lexical semantic resources such as WordNet or linguistically annotated corpora in different NLP tasks. Another research direction is to utilize NLP techniques to enhance the collaboration process and its outcome. This improves the overall quality of the CSRs [6,7]. Overall, the emergence of CSRs has generated new challenges to the research field that are to be addressed in the proposed workshop.
The preceding “The People’s Web meets NLP” workshops at ACL-IJCNLP 2009 and COLING 2010 have successfully gathered researchers from different areas, and enabled an interdisciplinary exchange of research outcomes and ideas. Such collaboration has contributed to the creation of valuable semantic resources and tools based on CSRs, such as word sense alignments between WordNet, Wikipedia, and Wiktionary [8,9,10], folksonomy and named entity ontologies [11,12], multiword terms [13], ontological resources [14,15], annotated corpora [16], and Wikipedia and Wiktionary APIs.
The obvious next step in this area is to intensify research that demonstrates the effectiveness of the resources mined from CSRs as listed above in a variety of NLP tasks. This is why the 3rd workshop “The People’s Web meets NLP” will especially welcome submissions that utilize resources and tools for CSRs. We invite both long and short papers and especially encourage to show the benefit of CSRs in diverse NLP tasks, for example word sense disambiguation [17] and semantic role labeling [18], in addition to further exploration of various aspects of CSRs. We also welcome tutorial-like submissions on using the software for CSRs to facilitate their wide adoption by the NLP community.
Specific topics include but are not limited to:
Though the workshop welcomes any CSRs-related topics, preference will be given to submissions on CSRs’ application to NLP tasks, which is the special interest of this workshop edition. Thereby, we encourage the participation of researchers with various backgrounds: from computational linguistics (e.g. parsing and discourse analysis) to NLP applications and other areas that might benefit from collaboratively constructed semantic resources. Given that we receive a sufficient number of tutorial-like submissions, a dedicated presentation session for those will be scheduled.
April 8, 2012 |
| Paper submission deadline (full and short) |
May 9, 2012 | Notification of acceptance | |
May 18, 2012 | Camera-ready version due | |
July 13, 2012 | Workshop | |
|
|
|
Full paper submissions should follow the two-column format of ACL 2012 proceedings without exceeding eight (8) pages of content plus two (2) extra pages for references. Short paper submissions should also follow the two-column format of ACL 2012 proceedings, and should not exceed four (4) pages of content and two (2) additional pages of references. We strongly recommend the use of ACL LaTeX style files or Microsoft Word Style files tailored for this year's conference, which are available on the conference website (
http://www.acl2012.org/call/sub01.asp) and also in the table below. All submissions must conform to the official ACL 2012 style guidelines announced in the conference website and they must be electronic in PDF.
ACL 2012 Style Files (direct links to templates files on ACL 2012 conference website)
Latex |
| |
MS Word |
|
As the reviewing will be blind, the paper must not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ...", must be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ...". Papers that do not conform to these requirements will be rejected without review.
Submission will be electronic using submission software (https://www.softconf.com/acl2012/people-web-2012/). All accepted papers will be presented orally and published in the workshop proceedings.
Iryna Gurevych |
| Ubiquitous Knowledge Processing Lab, TU Darmstadt |
Nicoletta Calzolari Zamorani | Istituto di Linguistica Computazionale, CNR | |
Jungi Kim | Ubiquitous Knowledge Processing Lab, TU Darmstadt |
Andras Csomai | Google Inc. | |
Andreas Hotho |
| Julius-Maximilians-Universität Würzburg |
Anette Frank | Heidelberg University | |
Benno Stein | Bauhaus University Weimar | |
Christian M. Meyer | Technische Universität Darmstadt | |
David Milne | University of Waikato | |
Delphine Bernhard | University of Strasbourg | |
Diana McCarthy | Lexical Computing Ltd, UK | |
Donald Metzler | Information Sciences Institute, University of Southern California | |
Emily Pitler | University of Pennsylvania | |
Ernesto William De Luca | Technische Universität Berlin | |
Florian Laws | University of Stuttgart | |
Gerard de Melo | UC Berkeley | |
German Rigau | University of the Basque Country | |
Graeme Hirst | University of Toronto | |
Günter Neumann | DFKI Saarbrücken | |
Ido Dagan | Bar Ilan University | |
John McCrae | University of Bielefeld | |
Jong-Hyeok Lee | Pohang University of Science and Technology | |
Judith Eckle-Kohler | Technische Universität Darmstadt | |
Magnus Sahlgren | Swedish Institute of Computer Science | |
Manfred Stede | Universität Potsdam | |
Massimo Poesio | University of Essex | |
Omar Alonso | Microsoft Bing | |
Paul Buitelaar | DERI, National University of Ireland, Galway | |
Rene Witte | Concordia University Montréal | |
Roxana Girju | University of Illinois at Urbana-Champaign | |
Saif Mohammad | National Research Council Canada | |
Shuming Shi | Microsoft Research | |
Sören Auer | Leipzig University | |
Tat-Seng Chua | National University of Singapore | |
Tonio Wandmacher | SYSTRAN, Paris, France | |
Zornitsa Kozareva | Information Sciences Institute, University of Southern California |
For further information about the workshop, please contact Jungi Kim.