Current Projects

Semantic Information Management

Semantic Information Retrieval (SIR-3)

This project systematically investigates the semantic and lexical relationships between words and concepts and its usefulness in information retrieval (IR) tasks. The current phase (III) of the project focuses on the development of large-scale word sense disambiguated multilingual lexical semantic resources and the development of novel semantics-based approaches to cross-lingual IR (CLIR).

IT Forensics (as part of CASED)

This project develops tools to process the natural language in collections of Web 2.0 documents for the identification of fraud and crime.   CASED brings together researchers from diverse backgrounds to collaborate on advanced security research.  The UKP lab operates the Forensic Linguistics project of CASED, with the goals of creating tools to aid the investigation of crimes on the Web, finding relevant documents using a semantic search, identifying relevant information bits (persons, places, times), and analyzing the relations between them.

Feel free to download our Forensic Linguistics Flyer.

Language Technology for eHumanities

Construction of Research Infrastructures for eHumanities (DARIAH-DE)

The mission of   DARIAH-EU is to enhance and support digitally-enabled research across the arts and humanities. DARIAH aims to develop and maintain an infrastructure in support of research practices based on information and communication technology - so called virtual research environments. The UKP Lab will provide illustrative prototypes and demonstrators specified in collaboration with researchers in the humanities, that will build upon the general infrastructure and best practices developed by DARIAH.

Loewe Research Center Digital Humanities: Text as an Instance

Descriptions of natural language grammars tend to focus on the canonical constructions of a language, yet actual usage also displays constructions that are in different ways marked and thus deviate from the canonical form. The project aims to validate the hypothesis that natural language grammars constitute systems of construction that centered on a set of canonical constructions of a particular language which are complemented by a set of peripheral non-canonical constructions. A contrastive investigation of non-canonical grammatical constructions between English and German is performed using corpus-based methods.

Loewe Research Center Digital Humanities: Text as a Process

In this project, we aim at gaining insights into collaboration-, production- and reception processes of collaboratively created Web 2.0 texts. We aim at analyzing the change of collaboratively created texts over time, discovering quality measures and identifying successful collaboration patterns. While focusing onWikipedia as one of the most popular instances of collaboration plattforms, our research results can be generalized to other areas of collaboration in the Web 2.0 and will foster research both in NLP and in the humanities.CLARIN-D

CLARIN-D: Implementation of a web-based annotation platform for linguistic annotations

We develop a web-based tool, which runs in a web browser without further installation effort. We support annotations on several linguistic layers within the same user interface. Further, we realize an interface to crowdsourcing platforms, to be able to scale simple annotation tasks to a large amount of annotators. The annotation platform will be connected to the CLARIN-D infrastructure, to be interoperable with the processing pipelines in WebLicht. The development of the tool is supported by a concurrent second curation project, which defines ‘best practices’ for linguistic annotation on several language layers for different annotator status groups.

Educational NLP

Educational Web 2.0 (EduWeb)

 In the EduWeb project, we seek to implement our vision of technology enhanced education of the 21st century. A vast amount of content is produced by many people every day, but despite their interconnection through the World Wide Web, their efforts are often isolated from each other. To overcome this problem, the UKP Lab will provide and explore new algorithms to simplify tedious, recurring tasks as well as improving the coordination with the community. 

Integrating Collaborative and Linguistic Resources for Word Sense Disambiguation and Semantic Role Labeling (InCoRe)

In the InCoRe project, we address the lack of coverage typically associated with lexical semantic resources. The major goal of this project is the integration of various expert-built and collaboratively created lexical semantic resources to a large-scale resource of unprecedented coverage and quality. The second major goal of InCoRe is to scale natural language processing technologies utilizing lexical semantic resources, specifically word sense disambiguation and semantic role labeling, to real-life applications based on the developed resource.

Feature-based Visualization and Analysis of Natural Language Documents (VisADoc)

This project, implemented in cooperation with the University of Konstanz, aims to investigate novel textual features for modeling content-related text properties, to develop an interactive feature engineering approach for complex user-defined semantic properties, and to develop visual analysis tools that support the exploration of large document collections with respect to a certain text property.

NLP and Wikis

Wikulu – Self-Organizing Wikis

Wikulu assists the user while creating, editing, or searching content. The self-organizing abilities of the wiki are enabled through Natural Language Processing algorithms like keyphrase extraction, document summarization, document clustering, or graph-based term weighting. 

Feel free to download our WIKULU Flyer

Utilizing Web Knowledge: Language Technologies and Psychological Processes

The project examines the usefulness of selected, innovative language technologies according to psychological processes and models. This research project will provide important groundwork by bringing together scientists from computer science, industrial science, and psychology.

Statistical Semantics

Loewe Research Center Digital Humanities: Text as Product

This project examines the correspondence of linguistic concepts and automatically extracted topic models. 
For our analysis, we annotate a text corpus with lexical cohesion relations and automatically acquire topics. Then, we use LDA topic models to predict lexical cohesion, at this using topic membership of lexical items and significance scores between lexical items to inform an automatic system for lexical chain annotation. Besides aiming at a state-of-the art system for lexical chain identification, we analyse the semiotic interpretability  of stochastic methods. 

Joint Project of UKP Lab

Darmstadt Knowledge Processing (DKPro) Repository

The DKPro Repository consists of a growing number of scalable, robust and flexible UIMA components for various kinds of NLP tasks such as tokenization, sentence splitting, PoS tagging, negation detection, lexical chaining, word pair extraction. 

Feel free to download our DKPro Flyer.

A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact
zum Seitenanfangzum Seitenanfang