The self-organizing abilities of the wiki are enabled through a set of semantic information management technologies, which are subject to fundamental research at UKP Lab.
Traditional keyphrase extraction approaches are based on term frequencies. We enhance this analysis by employing sophisticated Natural Language Processing techniques to find optimal candidate keyphrases as opposed to individual words. In the next step, we select the most important keyphrases capturing the meaning of the document based on an elaborate semantic representation of the text derived from the semantic relations in Wikipedia and Wiktionary.
We currently investigate the utility of keyphrase extraction algorithms and discourse analysis techniques to automatically establish useful links between unconnected parts of information. Through keyphrase extraction, the semantic representation of the wiki can be processed for link detection, and reduced to a network of tags for graphical presentation.
Conventional information retrieval systems suffer from the so called vocabulary gap, i.e. the users' search terms mismatch the terms used in the documents to express the same concept. UKP Lab developed algorithms that utilize the knowledge extracted from web 2.0 knowledge repositories such as Wikipedia and Wiktionary to close the vocabulary gap und therefore increase the effectiveness of information retreival.
UKP Lab works on graph based techniques to map the unstructured content of a wiki to its semantic structure. A distinguishing feature of our approach is that the graphs are both constructed and weighted according to measures of semantic relatedness utilizing the knowledge from collaboratively constructed resources such as Wikipedia and Wiktionary. Another special feature prunes the graphs according to inter-document relations.
Summarization extracts a short list of the most important sentences in an article. Wikulu presents these sentences and offers readers to understand the main concepts faster.
Combining information retrieval, keyphrase extraction, topic segmentation and link detection will in the long run enable supplementary, more complex user interfaces based on advanced question answering and text summarization techniques.