Information Consolidation: A New Paradigm in Knowledge Search (DIP project)

Motivation

Although existing search engines are effective in identifying relevant documents among bilions on non-relevant ones, they remain weak at isolating the facts of users' interest within these documents, let alone organizing and presenting this knowledge intuitively and concisely. Searchers have to laborously skim through all retrieved documents and collect the statements that are relevant to their information needs.

For example, a public decision maker in the domain of education wants to learn positive and negative experiences with a particular policy across countries, its impact on various populations, etc. So far, such information must have been consolidated by field experts, which is costly and time-consuming.

Goals

This project targets the big next step in information access technology by
  • Automatically identifying relevant statements
  • Consolidating the information and inferring relations between the statements
  • Enabling users to explore the consolidated information

Figure 1: An example of how atomic statements are created from their original documents

Methods

The progress of the project will be led by an iterative methodology that encompasses the following: 

  • Corpus - large data set of partially annotated data in the domain of educational topics acquired using focused crawling and web-based annotation tools
  • Linguistic annotation on various levels (syntax, semanantic roles, word senses, named entities, co-reference resolution, truth values, and other domain-specific ones) using state-of-the-art automatic annotation methods
  • Extracting atomic statements - by adapting and extending open information extraction techniques
  • Reflecting relationships between statements - by applying textual entailment and semantic similarity methods
  • Knowledge exploration - effective and efficient user interfaces for interactively displaying statements relevant to user queries

Figure 2.: Architecture overview

Results

In the project, the following corpora were created:

Team

Former staff:

Student theses

Master theses:

Bachelor theses:

  • Michelle Peters. Broad-coverage distantly supervised verb sense disambiguation. 2016
    Supervised by: Dr. Judith Eckle-Kohler and Prof. Iryna Gurevych

Publications

Additional Attributes

Type

Neural Disambiguation of Causal Lexical Markers Based on Context

Eugenio Martínez Cámara, Vared Shwartz, Iryna Gurevych, Ido Dagan
In: Proceedings of the 12th International Conference on Computational Semantics (IWCS 2017), Vol. Volume 2: Short papers, p. (to appear), September 2017
Association for Computational Linguistics
[Online-Edition: http://aclweb.org/anthology/W17-6927]
[Inproceedings]

Integrating Deep Linguistic Features in Factuality Prediction over Unified Datasets

Gabriel Stanovsky, Judith Eckle-Kohler, Yevgeniy Puzikov, Ido Dagan, Iryna Gurevych
In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vol. Volume 2: Short Papers, p. 352-357, August 2017
Association for Computational Linguistics
[Online-Edition: https://github.com/gabrielStanovsky/unified-factuality]
[Inproceedings]

LSDSem 2017: Exploring Data Generation Methods for the Story Cloze Test

Michael Bugert, Yevgeniy Puzikov, Andreas Rücklé, Judith Eckle-Kohler, Teresa Martin, Eugenio Martínez Cámara, Daniil Sorokin, Maxime Peyrard, Iryna Gurevych
In: Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics (LSDSem, held in conjunction with EACL2017), p. 56-61, April 2017
Association for Computational Linguistics
[Online-Edition: https://github.com/UKPLab/lsdsem2017-story-cloze]
[Inproceedings]

A Consolidated Open Knowledge Representation for Multiple Texts

Rachel Wities, Vered Shwartz, Gabriel Stanowsky, Meni Adler, Ori Shapira, Shyam Upadhyay, Dan Roth, Eugenio Martínez Cámara, Iryna Gurevych, Ido Dagan
In: Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics, p. 12-24, April 2017
Association for Computational Linguistics
[Inproceedings]

Funding

This project is funded by:

  • Funder: Deutsche Forschungsgemeinschaft (German Research Foundation)
  • Programme: DIP Programme; 17. Round of the German-Israeli project co-operation
  • Grant code: GU 798/17-1 and DA 1600/1-1
  • More information: funder web page of the project
A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang