Overview
The datasets on this page were obtained by asking human subjects to assign a similarity or relatedness judgment to a number of German word pairs. The datasets have been used to test the performance of semantic similarity/relatedness measures.
All subjects in our experiments were native speakers of German. A judgment of 0 means “fully unsimilar/unrelated”, while a score of 4 means “fully similar/related”.
In the comma-separated dataset files, each word pair is on a single line followed by the mean judgment score and the standard deviation.
Datasets
Gur65 dataset
This dataset contains 65 word pairs along with their similarity scores assigned on a discrete 0-4 scale by 24 subjects.
The inter-annotator agreement is 0.81.
This dataset is a German translation of the Rubenstein/Goodenough dataset [1]. The judgment values were not adopted from their work, but newly annotated.
The dataset is described in
Gur350 dataset
This dataset contains 350 word pairs along with their relatedness scores assigned on a discrete 0-4 scale by 8 subjects.
The inter-annotator agreement is 0.69.
ZG222 dataset
This dataset contains 222 word pairs along with their relatedness scores assigned on a discrete 0-4 by 21 subjects.
The inter-annotator agreement is 0.49.
The dataset is described in
![]() |
Automatically creating datasets for measures of semantic relatedness. In: COLING/ACL 2006 Workshop on Linguistic Distances. pp. 16-24, 2006.
|
Download
Download the datasets (zipped .csv files; latin1 encoded).
[1] Rubenstein, H. & Goodenough, J. B. Contextual Correlates of Synonymy Communications of the ACM, 1965, 8, 627-633






