All resources in UBY are represented according to a uniform and comprehensive LMF lexicon model, UBY-LMF. UBY-LMF captures lexical information at a fine-grained level by employing a large number of Data Categories from ISOcat. UBY-LMF, and thus also UBY, is designed to be directly extensible by new resources and languages. 

UBY-LMF enables structural and semantic interoperability (with respect to linguistic terminology) across resources down to a fine-grained level of semantic and syntactic information by employing a large number of Data Categories from  ISOcat (see also public Data Category Selection Uby 2012).

Download the current version of the UBY-LMF DTD here:

The UBY-LMF DTD contains references to  ISOcat Data Categories. Note that many attributes or attribute values in UBY-LMF link to ISOcat Data Categories with a different (so-called admitted) name. This is explicitly supported by ISOcat, because each Data Category definition may optionally contain Data Element Name Sections in order to record other names for the DC as used in different sources, such as a given database, format or application.

 Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

You are free to share (copy, distribute and transmit) the work,  to develop your own extensions (adapt, remix) of the work,  and to make commercial use of the work  under the condition that you give attribution along the following lines: 

Iryna GurevychJudith Eckle-KohlerSilvana HartmannMichael Matuschek
Christian M. Meyer, and Christian Wirth:
UBY  A Large-Scale Unified Lexical-Semantic Resource Based on LMF, in: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), p. 580-590, April 2012. Avignon, France.
PDF | BibTeX | ProceedingsSupplementary Data

If you alter, transform, or build upon this work,  you may distribute the resulting work only under the same or similar license to this one. 


A detailed description of UBY-LMF is given in this paper:

Judith Eckle-KohlerIryna GurevychSilvana HartmannMichael Matuschek, and
Christian M. Meyer:
UBY-LMF  A Uniform Model for Standardizing Heterogeneous Lexical-Semantic Resources in ISO-LMF, in: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), (to appear), May 2012. Istanbul, Turkey.
PDF | BibTeX | Proceedings

Lexical syntax is highly language-specific. Currently, the syntax part of UBY-LMF, modeling subcategorization and other lexical-syntactic properties, covers two languages: English and German. Details on the standardized format for subcategorization frames can be found here:

Judith Eckle-Kohler and Iryna Gurevych:
Subcat-LMF  Fleshing out a standardized format for subcategorization frame interoperability, in: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 550-560, April 2012. Avignon, France.
PDF | BibTeX | Proceedings | Supplementary Data and Tools

