Semantic Information Management

Semantic Information Management (SIM) leverages semantic processing techniques for adding structure to unstructured information for more accurate, high-precision and high-recall information search and retrieval.

Information comes in various forms and formats, including business documents, web pages, user manuals, FAQs, and software documentation. In an ever-increasing mass of information, finding the right piece of information is becoming more and more difficult.

Feel free to download our SIM Flyer.

Projects in Semantic Information Management

  • Theseus-TEXO: Community-enabled semantic service retrieval for the future Internet of services.
  • Theseus-MEDICO: Semantically enhanced intelligent image retrieval in the medical domain.
  • SIR – Semantic Information Retrieval: Enhancing conventional IR by integrating lexical-semantic knowledge.
  • DKPro: Ready-to-use robust NLP components based on IBM's UIMA Framework.
  • SIGMUND: Semantics- and Emotion-Based Conversation Management in Customer Support.

Semantic Information Management in the Context of Educational Information

The project aims at using natural language processing techniques to analyze educational information on the Web. Its main advantage is using automatic techniques to minimize human effort in searching through educational information. The analyzed data can be used for various applications, such as answering the users’ questions in the educational domain, discovery and tracking opinions and arguments of people about educational policies, and summarizing educational information about different topics. Text mining, information retrieval, text summarization, opinion mining, and argument analysis are the main techniques to achieve this goal.

Selected sub-projects:

  • Question Answering for Educational Information

    The basic idea of this project is to answer user questions on various educational topics. Since a large portion of users' questions have already been asked by other people and answered by educational experts or crowds, we use the available question and answer archives to answer these questions.
    The document processing part of the project collects and analyzes all pairs of question and answers that are available in FAQs and social QA forums. The analyzed data is then used during the online search. After issuing a new query by the user, the system uses paraphrasing techniques to find similar questions that have already been asked and are available in the archive to return their answers to the newly posted question. In addition, the system uses information retrieval methods to search through answers and retrieve relevant answers. The answer retrieval part is useful for the texts that can answer the input question, but their corresponding questions are not similar to the input question. The output of the system is a summarized set of answers to the user's question together with their sources and their quality scores.

    Semantisches Informationsmanagement (Diagramm)

  • Education Monitoring in the Internet: Identifying and Tracking of Controversies on Educational Topics

    In the recent years, a huge amount of user generated information in the Internet such as Twitter, Facebook, YouTube, etc. has emerged. It does not only offer a new quantity of generated data, but also a new quality: participation opportunities for people who were raised in recent media channels only as recipients. In this project, we focus on this new quality by identifying and tracking controversies on educational topics in Internet-based public social media. We focus on highly relevant topics in educational research that are visible for public and politics; e.g., PISA studies, educational diversity, the national report on education, etc.  
    To this aim, we research techniques to identify controversies on educational topics in social media and analyze people's opinions about these topics as well as their central arguments and discourse structures. The discovered controversies will then be tracked over time and space. The processed content is finally visualized to be presented to the users. The analysis facilitated by this project will be an important enabler for evidence informed policy making in the educational domain.



  • Retrieving and Summarizing Educational Document Collections Using Reinforcement Learning

    The German Education Server at DIPF offers a large collection of links to websites and documents for various educational topics. Curating and extending the contents requires a considerable amount of work by the editorial staff of the German Education Server. To aid this effort, this project explores methods for automating the steps in the editorial work by means of machine learning, in particular the reinforcement learning.

    Given a few reference documents, topic-specific links are first automatically crawled from the web, and the most relevant ones are selected. In order to present the content on the website, each link is automatically assigned with keyphrases, describing its content to the reader. Finally, we automatically create abstracts for each link as well as for the entire topics. Overall, the project aims at providing educational document databases, which require minimal editorial effort during their construction and maintenance.


    - Topical Crawling of Websites- Automated Extraction of Keywords & Summaries- Presentation on the German Education Server

Selected publications

Iryna Gurevych, Christof Müller, Torsten Zesch:

What to be? Electronic Career Guidance Based on Semantic Relatedness. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007). pp. 1032–1039, Association for Computational Linguistics, 2007.

Torsten Zesch, Christof Müller, Iryna Gurevych:
Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In: Proceedings of the Conference on Language Resources and Evaluation (LREC), 2008.

Christof Müller, Iryna Gurevych, Max Mühlhäuser:
Integrating Semantic Knowledge into Text Similarity and Information Retrieval. In: Proceedings of the First IEEE International Conference on Semantic Computing (ICSC). pp. 257–264, IEEE Press, New York, NY, 2007.


A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact
zum Seitenanfangzum Seitenanfang