Word Sense Disambiguation

Word sense disambiguation (WSD) is an open problem in natural language processing concerned with determining which sense (i.e., meaning) of a word is used in a particular context.  This UKPedia entry provides provides links to important WSD-related publications, software, corpora, and other resources.

Introductory material, overviews, and surveys

  •  Word sense disambiguation (Wikipedia)
  •  Word sense disambiguation (Scholarpedia)
  •  Word sense disambiguation (ACLWiki)
  • Eneko Agirre and Philip Edmonds, editors.  Word Sense Disambiguation: Algorithms and Applications, volume 33 of Text, Speech, and Language Technology. Springer, 2006. ISBN 978-1-4020-6870-6.
  •  Advances in Word Sense Disambiguation tutorial by Rada Mihalcea and Ted Pedersen (2005)
  • Roberto Navigli.  Word sense disambiguation: A survey. ACM Computing Surveys, 41:10:1–10:69,
    February 2009. ISSN 0360-0300.
  • Nancy Ide and Jean Véronis.  PDF Introduction to the special issue on word sense disambiguation: The state of the art. Computational Linguistics, 24(1):1–40, 1998. ISSN 0891-2017.
  • K. C. Litkowski. Computational lexicons and dictionaries. In Keith Brown, editor, Encyclopedia of Language and Linguistics, pages 753–761. Elsevier Science, Oxford, second edition, 2005. ISBN 978-0-08-044299-0.
  • Philip Edmonds. Lexical disambiguation. In Keith Brown, editor, Encyclopedia of Language and Linguistics, pages 607–623. Elsevier Science, Oxford, second edition, 2005. ISBN 978-0-08-044299-0.
  • David Jurafsky and James H. Martin. Speech and Language Processing, chapter Computational Lexical Semantics. Prentice Hall, second edition, 2008. ISBN 978-0131873216.
  • Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing, chapter Word Sense Disambiguation, pages 229–264. The MIT Press, 1999. ISBN 978-0262133609.
  • David Yarowsky. Word sense disambiguation. In Nitin Indurkhya and Fred J. Damerau, editors, Handbook of Natural Language Processing, pages 315–338. Chapman and Hall/CRC, second edition, 2010. ISBN 978-1420085921.

Sense inventories and other lexical resources

  •  DANTE
    A lexical database for English
  •  GCIDE_XML
    The GNU version of the Collaborative International Dictionary of English (CIDE), presented in XML
  •  HECTOR
    A 35-word English dictionary used for Senseval-1
  • Longman Dictionary of Contemporary English (LDOCE).  Burnt Mill, Essex: Longman, 1978
    This proprietary dictionary saw considerable use by the WSD research community before less restrictively licensed resources became available.
  • Roget's International Thesaurus.  New York: Harper Collins, 1992
    This proprietary thesaurus saw considerable use by the WSD research community before less restrictively licensed resources became available.
  •  The Open Roget's Project
    A free implementation of the 1911 Roget's Thesaurus.
  • Wordnets and associated resources
    •  WordNet
      A lexical database for English
    •  Wordnets in the world
      A list of wordnets for various languages
    •  eXtended WordNet
      A version of WordNet where the glosses are syntactically parsed, transformed into logic forms, and content words are semantically disambiguated
    •  Inter-version WordNet mappings
      Mapping between synsets offsets in various WordNet versions
    •  MCR
      An integration of five local wordnets, the EuroWordNet Top Concept ontology, MultiWordNet Domains, and hundreds of thousands of new semantic relations and properties automatically acquired from corpora.

Annotated corpora

  •  Alan Smeaton and Ian Quigley's image captions
    8816 WordNet 1.5-annotated instances of 2304 lemmas in 2714 image captions
  •  DSO Corpus of Sense-Tagged English
    Sense-tagged word occurrences for 121 nouns and 70 verbs occurring in the Brown Corpus and Wall Street Journal corpus
  •  HECTOR (Senseval-1)
    Separate training and test corpora with 35 word types annotated with their HECTOR senses.  See also Ted Pedersen's conversions.
  • interest
    Wall Street Journal articles with 2369 instances of "interest" annotated with their LDOCE senses.  See Ted Pedersen's conversions.
  • line, hard, serve
    Wall Street Journal articles with over 12,000 instances of "line", "hard", and "serve" tagged with a subset of their WordNet 1.5 senses.  See Ted Pedersen's conversions.
  •  Open Mind Word Expert sense-tagged data
    Various data sets for English, Romanian, and Hindi
  •  Rada Mihalcea's Senseval-2 and Senseval-3 conversions into SemCor format
    Senseval-2 and Senseval-3 English all-words data converted into SemCor format
  •  SemCor
    Brown Corpus texts annotated with WordNet 1.6 senses, and automatically mapped to WordNet 1.7, WordNet 1.7.1, WordNet 2.0, WordNet 2.1, WordNet 3.0
  •  SEMiSUSANNE
    33 sense-tagged and structurally annotated documents from the Brown Corpus
  •  Sensecorpus
    Automatically extracted examples for all WordNet 1.6 noun senses and topic signatures built based on those examples
  •  Senseval-2
    Three all-words sense-annotated Penn Treebank II articles comprising in total some 5000 words of running text, plus some Penn Treebank II Wall Street Journal and British National Corpus text where 75 to 300 instances of a total of 73 nouns, adjectives, and verbs have been annotated with their WordNet 1.7 senses.  See also Ted Pedersen's and Rada Mihalcea's conversions.
  •  Ted Pederson's Sense-tagged Text
    Versions of the Senseval-1, Senseval-2, line, hard, serve, and interest data which have been converted to a common format (Senseval-2), POS tagged, and parsed.
  •  TWA sense-tagged data
    Sense tagged data for six words with two-way ambiguities (bass, crane, motion, palm, plant, tank)
  •  WordNet Gloss Disambiguation Project
    A corpus of WordNet 3.0 glosses with word forms disambiguated to their WordNet 3.0 senses

Software

  •  CuiTools
    A complete word sense disambiguation system that assigns senses to biomedical text based on the UMLS
  • DKPro
    A collection of software components for natural language processing based on the Apache UIMA framework.
  •  GWSD: Unsupervised Graph-based Word Sense Disambiguation
    A system for unsupervised all-words graph-based word sense disambiguation
  •  LingPipe
    A Java natural language processing toolkit.  A  tutorial on using LingPipe for word sense disambiguation is available.
  •  Natural Language Toolkit (NLTK)
    Python modules for NLP, including a module for reading Senseval-2 data
  •  SenseClusters
    A package of (mostly) Perl programs that allows a user to cluster similar contexts together using unsupervised knowledge-lean methods.
  •  SenseLearner
    An all-words word sense disambiguation tool
  •  SenseTools
    A suite of tools that allow for easy creation of supervised word sense disambiguation
  •  Senseval-2 data format converters
    Tools to convert between the following formats: Senseval-1, Senseval-2, Senseval-2 with conflated words, Headless Senseval-2, WePS, English Giga Word, plain text, National Library of Medicine Test Collection, Open Mind Data
  •  WordNet::SenseRelate
    Perl tools which use measures of semantic similarity and relatedness to perform word sense disambiguation
  •  WSD Gate
    A word sense disambiguation toolkit using GATE and WEKA
  •  WSD Shell
    An improved version of the Duluth-Shell which was used as a driver for the Duluth Senseval-2 and Senseval-3 systems
A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact
zum Seitenanfangzum Seitenanfang