Current Projects

Automated Exercise Generation for Language Learners

In a labor market that is increasingly globalized, knowledge of one or even more than one foreign language is more relevant than ever before. New research technologies from the field of natural language processing can support self-directed learning as they offer tools for the assessment of text difficulty and enable the automated generation of adequate exercises.

DFG GRK 1994 Research Training Group AIPHES ("Adaptive Information Preparation from Heterogeneous Sources")

AIPHES develops new methods to deal with information overload by summarizing multiple documents to a condensed summary. We develop adaptive methods to create summaries of any type from multiple sources and across different genres. To do so, we combine different methodological backgrounds – computational linguistics, computer science, machine learning – to approach the task of extracting, summarizing and evaluating textual information from different sources.

Argumentative Writing Support

Formulating persuasive and well-formed arguments is a challenging task and a crucial aspect in writing skills acquisition. However, current writing support is limited to feedback about grammar or spelling and there is no system that provides formative feedback about argumentative writing. In this project, we aim to research novel methods for assisting authors in writing persuasive arguments with respect to the following questions: Is my argument well structured and comprehensible? Are the given reasons relevant for my claim? Does my argument include sufficient support for being persuasive?

Centre for the Digital Foundation of Research in the Humanities, Social, and Educational Sciences (CEDIFOR)

CEDIFOR intends to contribute to bridging the gap between research in the Humanities and computer based methods, and help researchers to master the characteristic problems in this process. It is a Digital Humanities Centre providing methodological expertise for advising researchers from the Humanities, Social, and Educational Sciences on adopting computer based methods in their research. This concerns the planning and operational stage of projects as well as the long-term provision of result data.

Construction of Research Infrastructures for eHumanities (DARIAH-DE)

The mission of DARIAH-EU is to enhance and support digitally-enabled research across the arts and humanities. DARIAH aims to develop and maintain an infrastructure in support of research practices based on information and communication technology - so called virtual research environments. The UKP Lab will provide illustrative prototypes and demonstrators specified in collaboration with researchers in the humanities, that will build upon the general infrastructure and best practices developed by DARIAH.

CLARIN-D: Implementation of a web-based annotation platform for linguistic annotations

We develop a web-based tool, which runs in a web browser without further installation effort. We support annotations on several linguistic layers within the same user interface. Further, we realize an interface to crowdsourcing platforms, to be able to scale simple annotation tasks to a large amount of annotators. The annotation platform will be connected to the CLARIN-D infrastructure, to be interoperable with the processing pipelines in WebLicht. The development of the tool is supported by a concurrent second curation project, which defines ‘best practices’ for linguistic annotation on several language layers for different annotator status groups.

Darmstadt Knowledge Processing (DKPro) Repository

The DKPro Repository consists of a growing number of scalable, robust and flexible UIMA components for various kinds of NLP tasks such as tokenization, sentence splitting, PoS tagging, negation detection, lexical chaining, word pair extraction. 

Feel free to download our DKPro Flyer.

Educational Web 2.0 (EduWeb)

In the EduWeb project, we seek to implement our vision of technology enhanced education of the 21st century. A vast amount of content is produced by many people every day, but despite their interconnection through the World Wide Web, their efforts are often isolated from each other. To overcome this problem, the UKP Lab will provide and explore new algorithms to simplify tedious, recurring tasks as well as improving the coordination with the community.

Feature-based Visualization and Analysis of Natural Language Documents (VisADoc)

This project, implemented in cooperation with the University of Konstanz, aims to investigate novel textual features for modeling content-related text properties, to develop an interactive feature engineering approach for complex user-defined semantic properties, and to develop visual analysis tools that support the exploration of large document collections with respect to a certain text property.

IT Forensics (as part of CASED)

This project develops tools to process the natural language in collections of Web 2.0 documents for the identification of fraud and crime.  CASED brings together researchers from diverse backgrounds to collaborate on advanced security research.  The UKP lab operates the Forensic Linguistics project of CASED, with the goals of creating tools to aid the investigation of crimes on the Web, finding relevant documents using a semantic search, identifying relevant information bits (persons, places, times), and analyzing the relations between them.

Feel free to download our Forensic Linguistics Flyer.

Information Consolidation: A New Paradigm in Knowledge Search (DIP project)

The DIP project - an international cooperation with Bar-Ilan University and Israel Institute of Technology - aims at the next big step in information access technology. The goal is to support users in identifying and assimilating the large set of relevant statements found within multitudes of documents which are usually retrieved by the current search technologies. Novel methods for statement extraction, information consolidation, and inferring relations represent the core research areas within this project.

Integrating Collaborative and Linguistic Resources for Word Sense Disambiguation and Semantic Role Labeling (InCoRe)

In the InCoRe project, we address the lack of coverage typically associated with lexical semantic resources. The major goal of this project is the integration of various expert-built and collaboratively created lexical semantic resources to a large-scale resource of unprecedented coverage and quality. The second major goal of InCoRe is to scale natural language processing technologies utilizing lexical semantic resources, specifically word sense disambiguation and semantic role labeling, to real-life applications based on the developed resource.

Knowledge Discovery in Scientific Literature

The main topic of this PhD program is knowledge discovery in the vast amount of scientific literature ubiquitously available on the Web and in historical texts. This research employs methods of intelligent identification and analysis of structures in scientific texts on all scales, enabling completely new, previously unforeseen forms of access to scientific information.


OpenMinTeD aspires to enable the creation of an infrastructure that fosters and facilitates the discovery and use of text mining technologies and interoperable services. It examines several use cases identified by experts from different scientific areas, ranging from generic scholarly communication to literature related to life sciences, food and agriculture, and social sciences and humanities.

Personality Profiling in Books

For the e-book recommendation systems it can be very helpful to know answers to high-level content questions that readers may have, for example "What is the main hero like?", "Is the story complicated?" or "Is the book suitable for children?". The idea of this project is to leverage real-world knowledge resources in order to facilitate estimating answers to such questions with a machine learning system. To reach this goal, the initial research focus lies in identifying suitable approaches to integrate semantic knowledge into the text classification algorithms.

Processing of Audiovisual Content: Integration of Automatic and Manual Analysis

A large quantity of modern educational content is audiovisual. The amount of this type of content is increasing rapidly due to the use of consumer electronics for audiovisual content. However, one issue raises when audiovisual content has to be manually analysed by humanistic researchers: manual analysis is a very hard and tedious task. The goals of this project is to create frameworks which facilitate the integration of manual and automatic analysis of audiovisual content and investigate which machine learning methods can automatically classify educational audiovisual content.

QA-EduInf: Community-based Question Answering for Educational Information

The project aims at using natural language processing techniques to analyze educational information and answer user questions on various educational topics. Since a large portion of users' questions have already been asked by other people in community question answering forums and answered by educational experts or crowds, we use the available question and answer archives to answer these questions and minimize human effort in searching through educational information. The project consists of different components including question classification, question and answer retrieval, answer quality assessment, and summarization.

Structuring Story-Chains

Nearly everyone is struggling to keep up with the larger and larger amounts of information, making this information-overload a major problem in todays society. The news domain is no exception. Since current search engines retrieve information based on keywords and sort the results based on their associated relevance for the entered search query, the large amount of returned articles makes it hard to understand the evolution of an event. In this project, we aim to develop novel methods for structuring news stories in a more coherent way by attempting to discover and model causal connections between articles, present complex news stories in a simpler way and reduce the information-overload.

UBY – Large-scale Sense-linked Lexical-semantic Resource

UBY is a large-scale lexical-semantic resource for natural language processing (NLP) based on the ISO standard Lexical Markup Framework (LMF). Most UBY related software is developed open source on Google Code. UBY combines a wide range of information from expert-constructed and collaboratively constructed resources for English and German.

Utilizing Web Knowledge: Language Technologies and Psychological Processes

The project examines the usefulness of selected, innovative language technologies according to psychological processes and models. This research project will provide important groundwork by bringing together scientists from computer science, industrial science, and psychology.

Welt der Kinder

The digital humanities project “Welt der Kinder” started in May 2014 and is designed as a test model for future similar projects. By very close cooperation between historians, information scientists, and computer scientists, it aims to gain new insights about the period from 1850 until 1918, a time in which an accelerated production of knowledge was dominated by both globalization and nationalisation at the same time

This poster gives an overview of the project context, its goals, and its methods.

A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang