DIP 2016 Corpus: Focused Retrieval over the Web

Corpus introduced in SIGIR 2016 article "New Collection Announcement: Focused Retrieval Over the Web"

Ivan Habernal and Maria Sukhareva and Fiana Raiber and Anna Shtok and Oren Kurland and Hadar Ronen and Judit Bar-Ilan and Iryna Gurevych.  New Collection Announcement: Focused Retrieval Over the Web In: SIGIR '16, Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, p. 701-704, ACM, July 2016.

The corpus is available here:



There are two folders available:

  • Step10AggregatedCleanGoldData

    • Contains intermediate data with original plain text, votes from Amazon Mechanical Turk workers, additional instruction to label relevant/irrelevant sentences, etc.

  • DIP2016Corpus

    • The final clean exported corpus


  • The annotations are licensed under CC-BY 4.0.
  • The original content from ClueWeb12 keeps its original license.
  • Please cite the SIGIR 2016 article if you use the data in any of your work.

Processing software

A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang