Statistical Semantics

The Statistical Semantics research group examines statistical methods that reflect natural-language semantics. Specifically we compute semantic similarities and semantic relations between lexical items through the analysis of large texts. These relations are used in applications such as semantic indexing, paraphrasing and identification of lexical chains.

Structure Discovery

The focus of this group is on unsupervised and knowledge free methods (e.g., clustering of lexical graphs) or topic models. These methods, which neither presuppose training data nor assume the existence of knowledge resources, identify regularities in large text collections and mark them back into the data, following the structure discovery paradigm. This markup, which is entirely data-driven and therefore independent of domain and language, is then used as features for learning applications in supervised machine learning settings: the usefulness of structure discovery processes is assessed in an application-based manner.

Current Projects

  • Text as Product as part of the LOEWE Research Center for Digital Humanities: Machine learning methods are often used in automatic document classification or Information Retrieval. This subproject aims at investigating the use of topic models in the automatic analysis of corpora. Topic models are generative probabilistic models that identify the main topics of a document collection, which is analogous to the analysis of cohesion and coherence of single documents. A syntactically and semantically annotated corpus is extended with lexical cohesion information as part of subproject. Then, statistical models, like topic models, are applied in order to examine the usefulness of approaches of statistical semantics with regard to the existing linguistic features.

Crowdsourcing

For obtaining the markup necessary to train supervised language technology components, the group advocates the use of crowdsourcing techniques. Here, unskilled workers are paid small sums to perform small annotation tasks. The advantage of this is the virtually unlimited number of annotators, which makes the creation of training data quick and scalable. Quality is ensured by redundancy and by using qualification tests or test items. A major challenge lies in the formulation of complex annotation tasks needed as simple subtasks suitable for the crowd.

Selected Publications

Displaying results 1 to 7 out of 8

 Page 1 Page 2 Next >
Supervised All-Words Lexical Substitution using Delexicalized Features
György Szarvas and Chris Biemann and Iryna Gurevych
In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), June 2013.

Three Knowledge-Free Methods for Automatic Lexical Chain Extraction
Steffen Remus and Chris Biemann
In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), 2013.

Text Segmentation with Topic Models
Martin Riedl and Chris Biemann
In: Journal for Language Technology and Computational Linguistics (JLCL), vol. 27, no. 1, p. 47--70, August 2012.

TopicTiling: A Text Segmentation Algorithm based on LDA
Martin Riedl and Chris Biemann
In: Student Research Workshop of the 50th Meeting of the Association for Computational Linguistics, p. 37--42, July 2012.

How Text Segmentation Algorithms Gain from Topic Models
Martin Riedl and Chris Biemann
In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2012), p. 553--557, June 2012.

Sweeping through the Topic Space: Bad luck? Roll again!
Martin Riedl and Chris Biemann
In: ROBUS-UNSUP 2012: Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP held in conjunction with EACL 2012, p. 19--27, April 2012.

Remembering words in context as predicted by an Associative Read-Out Model
Markus J. Hofmann and Lars Kuchinke and Chris Biemann and Sascha Tamm and Arthur M. Jacobs
In: Frontiers in Psychology, vol. 2, no. 252, p. 1--11, September 2011.
http://www.frontiersin.org/language_sciences/10.3389/fpsyg.2011.00252/abstract.

 Page 1 Page 2 Next >

People

A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact
zum Seitenanfangzum Seitenanfang