Wikulu - Self-Organizing Wikis

Project Goals

The importance of web-based collaboration systems called Wikis has grown tremendously over the last years, e.g. Wikipedia, corporate wikis. As the usability of a wiki is initially very high, the amount of content grows very fast. A common drawback of wikis is however that the usability decreases with the increased content amount. The Wikulu - Self-Organizing Wikis project at the UKP Lab employs the latest Natural Language Processing (NLP) technologies to manage unstructured information, i.e. to structure the content in corporate Wikis. The objective of the project is thus to implement intelligent approaches to assist the user while creating, editing, or searching content. Wikulu should relieve the user of manual information management, leaving more room for productive work. Why is it called Wikulu? Kukulu is Hawaiian for to organize!

Project Publications

Approaches to Automatic Text Structuring

Author Nicolai Erbs
Date September 2015
Kind Phdthesis
Research Areas Ubiquitous Knowledge Processing, UKP_s_JWPL, UKP_s_DKPro_Similarity, UKP_s_DKPro_Core, UKP_p_WIKULU, UKP_p_WIWEB, UKP_p_openwindow, UKP_p_DKPro, UKP_a_NLP4Wikis, UKP_a_ENLP
Abstract Structured text helps readers to better understand the content of documents. In classic newspaper texts or books, some structure already exists. In the Web 2.0, the amount of textual data, especially user-generated data, has increased dramatically. As a result, there exists a large amount of textual data which lacks structure, thus making it more difficult to understand. In this thesis, we will explore techniques for automatic text structuring to help readers to fulfill their information needs. Useful techniques for automatic text structuring are keyphrase identification, table-of-contents generation, and link identification. We improve state of the art results for approaches to text structuring on several benchmark datasets. In addition, we present new representative datasets for users’ everyday tasks. We evaluate the quality of text structuring approaches with regard to these scenarios and discover that the quality of approaches highly depends on the dataset on which they are applied. In the first chapter of this thesis, we establish the theoretical foundations regarding text structuring. We describe our findings from a user survey regarding web usage from which we derive three typical scenarios of Internet users. We then proceed to the three main contributions of this thesis. We evaluate approaches to keyphrase identification both by extracting and assigning keyphrases for English and German datasets. We find that unsupervised keyphrase extraction yields stable results, but for datasets with predefined keyphrases, additional filtering of keyphrases and assignment approaches yields even higher results. We present a de- compounding extension, which further improves results for datasets with shorter texts. We construct hierarchical table-of-contents of documents for three English datasets and discover that the results for hierarchy identification are sufficient for an automatic system, but for segment title generation, user interaction based on suggestions is required. We investigate approaches to link identification, including the subtasks of identifying the mention (anchor) of the link and linking the mention to an entity (target). Approaches that make use of the Wikipedia link structure perform best, as long as there is sufficient training data available. For identifying links to sense inventories other than Wikipedia, approaches that do not make use of the link structure outperform the approaches using existing links. We further analyze the effect of senses on computing similarities. In contrast to entity linking, where most entities can be discriminated by their name, we consider cases where multiple entities with the same name exist. We discover that similarity de- pends on the selected sense inventory. To foster future evaluation of natural language processing components for text structuring, we present two prototypes of text structuring systems, which integrate techniques for automatic text structuring in a wiki setting and in an e-learning setting with eBooks.
Website http://tuprints.ulb.tu-darmstadt.de/4959/
Full paper (pdf)
[Export this entry to BibTeX]

Important Copyright Notice:

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


We are always looking for students who are interested in Wikulu and want to help us with our programming and research tasks. Please contact us if you want to know more!

Related Projects

The Wikulu project builds upon cutting-edge fundamental NLP technologies developed at UKP Lab to solve real-life knowledge management problems. It builds upon several successful projects ongoing at the UKP Lab, such as:

  • WiWeb funded by the Förderinitiative Interdisziplinäre Forschung: Utilizing Web Knowledge: Language Technologies and Psychological Processes
  • SIR 1+2 funded by the German Research Foundation (DFG): Extracting structured lexical semantic knowledge from wiki-based web 2.0 sources such as Wikipedia and Wiktionary and integrating contextually-aware semantic relatedness into information retrieval and keyphrase extraction
  • DKPro funded by UIMA 2007 Innovation Award and by two UIA 2008 Innovation Awards from IBM: Integrating NLP components in a repository of semantic information management software based on an industrial strength IBM’s Unstructured Information Management Architecture (UIMA) framework


The Wikulu - Self-Organizing Wikis project is funded by the Klaus Tschira Foundation.


A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang