Full papers

Yumi Shibaki, Masaaki Nagata and Kazuhide Yamamoto
Constructing Large-Scale Person Ontology from Wikipedia

Fabio Massimo Zanzotto and Marco Pennacchiotti
Expanding textual entailment corpora fromWikipedia using co-training

Luisa Bentivogli, Claudio Giuliano, Pamela Forner, Alessandro Marchetti‎, Emanuele Pianta‎ and Kateryna Tymoshenko
Extending English ACE 2005 Corpus Annotation with Ground-truth Links to Wikipedia

M. Atif Qureshi, Arjumand Younus, Muhammad Saeed and Nasir Touheed
Identifying and Ranking Topic Clusters in the Blogosphere

Stephan Gouws, G-J van Rooyen and Herman A. Engelbrecht
Measuring Conceptual Similarity by Spreading Activation over Wikipedia's Hyperlink Structure

Ji Fang, Bob Price and Lotti Price
Pruning Non-Informative Text Through Non-Expert Annotations to Improve Sentiment Classification

Benjamin Mark Pateman
Using the Wikipedia Link Structure to Correct the Wikipedia Link Structure

Short papers

Masao Utiyama, Takeshi Abekawa, Eiichiro Sumita and Kyo Kageura
Helping Volunteer Translators, Fostering Language Resources

Invited Talks

Speaker: Tat-Seng Chua, National University of Singapore

Title: "Extracting Knowledge from Community Question-Answering Sites"


Community question-answering (QA) services, like Yahoo! Answers, contain a huge amount of information in the form of QA pairs accumulated over many years. The information covers a wide variety of topics on questions of great interests to and frequently asked by the users. To make this huge amount of information accessible by general users, research has been carried out to help users find similar questions with readily available answers. However, a better approach is to organize all relevant QA pairs around a given topic into a knowledge structure to help users better understand the overall topic. To accomplish this, our research leverages on appropriate topic prototype hierarchy automatically acquired from the Web or Wikipedia to guide the organization of the un-structured user-generated-contents in community QA sites. More specifically, we propose a prototype-hierarchy based clustering algorithm that utilizes the category structure information, article contents of Wikipedia, as well as distribution of relevant QA pairs around the topic based on a multi-criterion optimization function. This talk discusses our research to transform unstructured community QA resources into knowledge structure.

Short bio

Chua Tat-Seng the KITHC Chair Professor at the School of Computing, National University of Singapore (NUS). He was the Acting and Founding Dean of the School of Computing during 1998-2000. He joined NUS in 1983, and spent three years as a research staff member at the Institute of Systems Science (now I2R) in the late 1980s. Dr Chua's main research interest is in multimedia information retrieval, in particular, on the analysis, retrieval and question-answering (QA) of text and image/video information. He is currently working on several multi-million-dollar projects: interactive media search, local contextual search, and real-time live media search. His group participates regularly in TREC-QA and TRECVID video retrieval evaluations. Dr Chua has organized and served as program committee member of numerous international conferences in the areas of computer graphics, multimedia and text processing. He is the conference co-chair of ACM Multimedia 2005, CIVR (Conference on Image and Video Retrieval) 2005, and ACM SIGIR 2008. He serves in the editorial boards of:ACM Transactions of Information Systems (ACM), Foundation and Trends in Information Retrieval (NOW), The Visual Computer (Springer Verlag), and Multimedia Tools and Applications (Kluwer). He is the member of steering committee of CIVR, Computer Graphics International, and Multimedia Modeling conference series; and as member of International Review Panels of two large-scale research projects in Europe.


The workshop builds upon the success of the first ACL “The People’s Web meets NLP” Workshop in 2009 that attracted 21 submissions. Accepted submissions included papers on Wikipedia [1], Wiktionary [2], Mechanical Turk [3], and game-based construction of semantic resources [4]. This clearly demonstrates a substantial and growing interest of the NLP community in collaboratively constructed semantic resources (CSRs), also evidenced by the increasing number of publications in this area and the EMNLP 2009 Web 2.0 track. In many works, CSRs have been used to overcome the knowledge acquisition bottleneck and coverage problems pertinent to conventional lexical semantic resources. The greatest popularity in this respect can so far certainly be attributed to Wikipedia [1]. However, other resources, such as folksonomies or the multilingual collaboratively constructed dictionary Wiktionary, have also shown great potential. Thus, the scope of the workshop deliberately includes any collaboratively constructed resource, not only Wikipedia.


Effective deployment of CSRs to enhance NLP introduces a pressing need to address a set of fundamental challenges, e.g. the interoperability with existing resources, or the quality of the extracted lexical semantic knowledge. Interoperability between resources is crucial as no single resource provides perfect coverage. The quality of CSRs is a fundamental issue, as they lack editorial control and entries are often incomplete. Thus, techniques for link prediction [5] or information extraction [6] have been proposed to guide the "crowds" while constructing resources of better quality.


The workshop will bring together researchers from different worlds, for example those using collaboratively constructed resources as sources of lexical semantic information for NLP purposes such as information retrieval, named entity recognition, or keyword extraction, and those using NLP techniques to improve the resources or extract and analyze different types of lexical semantic information from them. We will especially welcome contributions of interdisciplinary nature, e.g. those applying discourse analysis techniques from computational linguistics to the content of CSRs to better understand their properties.


Specific topics include but are not limited to:


  • Computational linguistics studies of collaboratively constructed resources, such as wiki-based platforms, folksonomies, Twitter, or social networks;

  • Using collaboratively constructed resources for NLP purposes such as information retrieval, text categorization, information extraction, etc.;

  • Using special features of collaboratively constructed resources to create novel resource types, for example revision-based corpora, simplified versions of resources, etc.;

  • Analyzing the structure of collaboratively constructed resources related to their use in NLP;

  • Interoperability of collaboratively constructed resources with conventional lexical semantic resources and between themselves;

  • Mining social and collaborative content for constructing structured semantic resources and the corresponding tools;

  • Mining multilingual information from collaboratively constructed resources;

  • Quality and reliability of collaboratively constructed semantic resources.



We especially encourage short papers describing publicly available tools for accessing or analyzing collaboratively constructed resources that can serve as a multiplier in the NLP community.


The workshop is intended to be highly interdisciplinary. Thus, we encourage the participation of researchers working on computational linguistics aspects (e.g. parsing or discourse analysis) or NLP applications (e.g. information retrieval, information extraction, question answering, and knowledge representation) as well as researchers from other areas who might benefit from collaboratively constructed semantic resources.


Substantially extended versions of the best papers from the workshop can

be submitted to a planned Special Issue in one of the major computational

linguistics journals. The revised papers will have to undergo a separate

reviewing process required for journal publications.



Workshop Chairs

Iryna Gurevych and Torsten Zesch

Ubiquitous Knowledge Processing Lab

Technische Universität Darmstadt

Program Committee

Andras Csomai Google Inc.
Anette Frank Heidelberg University
Benno Stein Bauhaus University Weimar
Bernardo Magnini ITC-irst Trento
Christiane Fellbaum Princeton University
Dan Moldovan University of Texas at Dallas
Delphine Bernhard LIMSI-CNRS, Orsay
Diana McCarthy Lexical Computing Ltd
Elke Teich Technische Universität Darmstadt
Emily Pitler University of Pennsylvania
Eneko Agirre University of the Basque Country
Erhard Hinrichs Eberhard Karls Universität Tübingen
Ernesto De LucaTechnische Universität Berlin
Florian Laws University of Stuttgart
Gerard de Melo MPI Saarbrücken
German Rigau University of the Basque Country
Graeme Hirst University of Toronto
Günter Neumman DFKI Saarbrücken
György Szarvas Technische Universität Darmstadt
Hans-Peter Zorn European Media Lab, Heidelberg
José Iria University of Sheffield
Laurent Romary LORIA, Nancy
Magnus Sahlgren Swedish Institute of Computer Science
Manfred Stede Potsdam University
Omar Alonso Microsoft
Pablo Castells Universidad Autónonoma de Madrid
Paul Buitelaar DERI, National University of Ireland, Galway
Philipp Cimiano Delft University of Technology
Razvan Bunescu University of Texas at Austin
Rene Witte Concordia University Montréal
Roxana Girju University of Illinois at Urbana-Champaign
Saif Mohammad University of Maryland
Samer Hassan University of North Texas
Sören Auer Leipzig University
Tonio Wandmacher CEA, Paris

Important Dates

Submission deadline (full and short): extended to June 6, 2010
Notification of acceptance of papers: June 30, 2010
Camera-ready copy of papers due: July 10, 2010
COLING 2010 Workshop: Aug 28, 2010


Full paper submissions should follow the two-column format of the COLING 2010 proceedings without exceeding eight (8) pages of content plus one extra page for references.

Short paper submissions should also follow the two-column format of the COLING 2010 proceedings, and should not exceed four (4) pages, including references. We strongly recommend the use of LaTeX style files or Microsoft Word Style files tailored for this year's conference. The official style files for COLING 2010 are available at:

As the reviewing will be blind, the paper must not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ...", must be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ...". Papers that do not conform to these requirements will be rejected without review.

All accepted papers will be presented orally and published in the workshop proceedings.

Submission will be electronic.The only accepted format for submitted papers is Adobe PDF. The deadline for all papers is May 30, 2010 (GMT-12).

Electronic submission site:



Information on registration is provided on the COLING homepage.


For further information about the workshop, please contact Torsten Zesch.



