S. Hartmann, G. Szarvas, and I. Gurevych (2011):
Mining Multiword Terms from Wikipedia, in M.T. Pazienza & A. Stellato (Eds.): Semi-Automatic Ontology Development: Processes and Resources, pp. 226-258, Hershey, PA, USA: IGI Global.
PDF |
BibTeX | Book
Resource Download
WikiMwe is based on
Wikipedia, and therefore available under the Creative Commons Attribution/Share-Alike License (CC-BY-SA).
Description
WikiMwe is a large resource of English multiword expressions mined from Wikipedia. It contains over 350,000 multiword units of size 2-4, including
For each entry, POS and frequency information and pointwise mutual information (PMI) scores are included. Additionally, we provide definitional and category information for many entries, to facilitate the application of the resource in theoretical (semantic similarity, domain disambiguation) and applied (terminology extraction) NLP research.
Example Entry
Coming Soon
We are currently working on WikiMwe resources for other languages (starting with German) and on the development of a software package for the language-independent extraction of multiword expressions from Wikipedia. We will make these resources available in the future.
Please contact me if you have any questions regarding the resource: Silvana Hartmann.