Unstructured Information Management (Project, WS 2010/11)

This page is outdated. Please visit the page for summer 2011.

Description

While a significant amount of knowledge today is already available in structured form in databases or as part of the semantic web, most knowledge still is recorded in unstructured form as natural language artifacts such as text documents, audio or video recordings. The Unstructured Information Management (UIMA) framework, originally developed by IBM, offers a platform to impose structure on unstructured data, and thus facilitates the extraction of knowledge from unstructured sources.

This project addresses changing topics from the areas of natural language processing, information extraction, information retrieval, and semantic knowledge processing.

  • Extract text from unstructured sources
  • Index the extracted text and search on it
  • Come up with some searches and manually make a list of relevant results to use as a basis for evaluation
  • Use various techniques from simple dictionaries to semantic resources to improve results
  • Visualize results
  • Evaluate performance 

The Darmstadt Knowledge Processing Software Repository (DKPro) provided by UKP offers a set of ready-to-use Java libraries for analysis and indexing. The project will be implemented on top of the Apache Unstructured Information Management (UIMA) framework.

Registration

If you plan to participate in this course, please register yourself. (Registration closed)

Time and Location

Introductory sessions will be held during the first three weeks Thursdays (21.10., 28.10., 4.11.) from 15:30 to 18:30. Each session consists of a lecture part taking place in S2|02 A126 immediately followed by an exercise part taking part in S2|02 C003 (small pool room behind the large pool room in the center of Piloty's basement).

Brief regular status meetings are (tentatively) planned for Thursdays between 15:30 to 18:30. Actual times may vary depending on the number of participants.

Material

The course management system is used as the primary communication platform for the project and also contains any related material.

Objectives

  • Understand and apply methods of natural language processing (NLP)
  • Understand and apply methods of information retrieval (IR)
  • Comparatively evaluate different approaches
  • Employ UIMA to implement complex natural language processing systems

Prerequisites

  • Knowledge of Java programming
  • Principles of algorithms and data structures

Lecturers

Literature

A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang