All resources in UBY are represented according to a uniform and comprehensive LMF lexicon model, UBY-LMF. UBY-LMF captures lexical information at a fine-grained level by employing a large number of Data Categories from ISOCat. UBY-LMF, and thus also UBY, is designed to be directly extensible by new resources and languages.
UBY-LMF enables structural and semantic interoperability (with respect to linguistic terminology) across resources down to a fine-grained level of semantic and syntactic information by employing a large number of Data Categories from
ISOCat (see public Data Category Selection Uby 2012).
Download the UBY-LMF DTD 1.0 and the corresponding UML diagram here.
The UBY-LMF DTD contains links to ISOCat Data Categories. Note that many attributes or attribute values in UBY-LMF link to ISOCat Data Categories with a different (so-called admitted) name. This is explicitly supported by ISOCat, because each Data Category definition may optionally contain Data Element Name Sections in order to record other names for the DC as used in different sources, such as a given database, format or application.
![]()
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
You are free to share (copy, distribute and transmit) the work, to develop your own extensions (adapt, remix) of the work, and to make commercial use of the work under the condition that you give attribution along the following lines:
Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek,
Christian M. Meyer, and Christian Wirth:
UBY – A Large-Scale Unified Lexical-Semantic Resource Based on LMF, in: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), p. 580-590, April 2012. Avignon, France.
PDF | BibTeX | Proceedings | Supplementary Data
If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.
A detailed description of UBY-LMF is given in this paper:
Judith Eckle-Kohler, Iryna Gurevych, Silvana Hartmann, Michael Matuschek, and
Christian M. Meyer:
UBY-LMF – A Uniform Model for Standardizing Heterogeneous Lexical-Semantic Resources in ISO-LMF, in: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), (to appear), May 2012. Istanbul, Turkey.
PDF | BibTeX | Proceedings
Lexical syntax is highly language-specific. Currently, the syntax part of UBY-LMF, modeling subcategorization and other lexical-syntactic properties, covers two languages: English and German. Details on the standardized format for subcategorization frames can be found here:
Judith Eckle-Kohler and Iryna Gurevych:
Subcat-LMF – Fleshing out a standardized format for subcategorization frame interoperability, in: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 550-560, April 2012. Avignon, France.
PDF | BibTeX | Proceedings | Supplementary Data and Tools