Semantic Information Retrieval (SIR)

3rd funding period

Motivation

Semantic Information Retrieval (SIR) project systematically investigates the semantic and lexical relationships between words and concepts and its usefulness in information retrieval (IR) process.

Using a variety of lexical and semantic knowledge sources such as WordNet, GermaNet, and Wikipedia, the first and second phases of the project have investigated semantic relatedness measures in monolingual settings.

The third phase of SIR project builds upon the successful outcomes of the previous phrases, with focus on the development of the following aspects:

  1. Large-scale word sense disambiguated multilingual lexical semantic resource
  2. Novel semantic approaches to cross-lingual IR (CLIR)

In general, IR approaches utilizing keyword matching techniques suffer “term mismatch” or “vocabulary gap” problem, because mere lookup of surface word forms fails to capture the semantic meanings expressed in users’ query. The task becomes even more difficult when cross-lingual information need is involved (e.g. search for English documents with a query in German).

Goals

The goals of this project are as follows:

  1. Large-scale word sense disambiguated multilingual lexical semantic resource

    • Merge collaboratively created resources and existing lexical semantic resources into a single large-scale translation resource

  2. Novel semantic approaches to cross-lingual IR (CLIR)
    • Improve over the state-of-the-art CLIR performance
    • Participate in the international benchmarking competitions

     

Methods

Our approach to achieve the above mentioned goals is as follows:

  1. Large-scale word sense disambiguated multilingual lexical semantic resource

    • Extract word sense disambiguated translations from the large-scale collaboratively created resources such as Wikipedia and Wiktionary
    • Automatically align this with existing knowledge bases such as WordNet, GermaNet, and EuroWordNet at the word sense level

  2. Novel semantic approaches to cross-lingual IR (CLIR)
    • Train statistical translation models from word sense disambiguated translation resource
    • Develop new cross-language Explicit Semantic Analysis (ESA) approaches and integrate them into CLIR models

Partners

The Division of Computational Linguistics at the University of Tübingen is co-applicant of the SIR project. Their research focus is to further extend GermaNet with (i) additional synsets and their definitions, and (ii) additional Interlingual Index (ILI) entries for interconnecting senses in multilingual knowledge bases.

People

Project Publications

Additional Attributes

Type

The People’s Web Meets NLP: Collaboratively Constructed Language Resources

Iryna Gurevych, Jungi Kim
In: Theory and Applications of Natural Language Processing, E. Hovy, M. Johnson and G. Hirst (eds.), "01.127.2013" could not be parsed by \DateTime constructor.
Springer
[Online-Edition: www.ukp.tu-darmstadt.de/scientific-community/edited-book-the-peoples-web-meets-nlp]
[Book]

UKP at CrossLink2: CJK-to-English Subtasks

Jungi Kim, Iryna Gurevych
In: Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies, p. 57-61, June 2013
National Institute of Informatics
[Online-Edition: http://research.nii.ac.jp/ntcir/ntcir-10/index.html]
[Inproceedings]

Learning Semantics with Deep Belief Network for Cross-Language Information Retrieval

Jungi Kim, Jinseok Nam, Iryna Gurevych
In: Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), p. 579-588, 2012
[Inproceedings]

UKP at CrossLink: Anchor Text Translation for Cross-lingual Link Discovery

Jungi Kim, Iryna Gurevych
In: Proceedings of the 9th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access, p. 487-494, December 2011
[Inproceedings]
A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang