Overview

The datasets on this page were obtained by asking human subjects to assign a similarity or relatedness judgment to a number of German word pairs. The datasets have been used to test the performance of semantic similarity/relatedness measures.

All subjects in our experiments were native speakers of German. A judgment of 0 means “fully unsimilar/unrelated”, while a score of 4 means “fully similar/related”.

In the comma-separated dataset files, each word pair is on a single line followed by the mean judgment score and the standard deviation.

Datasets

Gur65 dataset

This dataset contains 65 word pairs along with their similarity scores assigned on a discrete 0-4 scale by 24 subjects.
The inter-annotator agreement is 0.81.
This dataset is a German translation of the Rubenstein/Goodenough dataset [1]. The judgment values were not adopted from their work, but newly annotated.
The dataset is described in

PDF Abstract No BibTeX entry, sorry!
Using the Structure of a Conceptual Network in Computing Semantic Relatedness. In: Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP’2005), Jeju Island, Republic of Korea, October 11 - 13. (to appear), 2005.

Gur350 dataset

This dataset contains 350 word pairs along with their relatedness scores assigned on a discrete 0-4 scale by 8 subjects.
The inter-annotator agreement is 0.69.

ZG222 dataset

This dataset contains 222 word pairs along with their relatedness scores assigned on a discrete 0-4 by 21 subjects.
The inter-annotator agreement is 0.49.
The dataset is described in

PDF Abstract BibTeX Entry
Automatically creating datasets for measures of semantic relatedness. In: COLING/ACL 2006 Workshop on Linguistic Distances. pp. 16-24, 2006.

Download

Download the datasets (zipped files; UTF8 encoded).


[1] Rubenstein, H. & Goodenough, J. B. Contextual Correlates of Synonymy Communications of the ACM, 1965, 8, 627-633

A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang