The Statistical Semantics research group examines statistical methods that reflect natural-language semantics. Specifically we compute semantic similarities and semantic relations between lexical items through the analysis of large texts. These relations are used in applications such as semantic indexing, paraphrasing and identification of lexical chains.
The focus of this group is on unsupervised and knowledge free methods (e.g., clustering of lexical graphs) or topic models. These methods, which neither presuppose training data nor assume the existence of knowledge resources, identify regularities in large text collections and mark them back into the data, following the structure discovery paradigm. This markup, which is entirely data-driven and therefore independent of domain and language, is then used as features for learning applications in supervised machine learning settings: the usefulness of structure discovery processes is assessed in an application-based manner.
For obtaining the markup necessary to train supervised language technology components, the group advocates the use of crowdsourcing techniques. Here, unskilled workers are paid small sums to perform small annotation tasks. The advantage of this is the virtually unlimited number of annotators, which makes the creation of training data quick and scalable. Quality is ensured by redundancy and by using qualification tests or test items. A major challenge lies in the formulation of complex annotation tasks needed as simple subtasks suitable for the crowd.
| Supervised All-Words Lexical Substitution using Delexicalized Features |
| György Szarvas and Chris Biemann and Iryna Gurevych In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), June 2013. |
| Three Knowledge-Free Methods for Automatic Lexical Chain Extraction |
| Steffen Remus and Chris Biemann In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), 2013. |
| Text Segmentation with Topic Models |
| Martin Riedl and Chris Biemann In: Journal for Language Technology and Computational Linguistics (JLCL), vol. 27, no. 1, p. 47--70, August 2012. |
| TopicTiling: A Text Segmentation Algorithm based on LDA |
| Martin Riedl and Chris Biemann In: Student Research Workshop of the 50th Meeting of the Association for Computational Linguistics, p. 37--42, July 2012. |
| How Text Segmentation Algorithms Gain from Topic Models |
| Martin Riedl and Chris Biemann In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2012), p. 553--557, June 2012. |
| Sweeping through the Topic Space: Bad luck? Roll again! |
| Martin Riedl and Chris Biemann In: ROBUS-UNSUP 2012: Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP held in conjunction with EACL 2012, p. 19--27, April 2012. |
| Remembering words in context as predicted by an Associative Read-Out Model |
| Markus J. Hofmann and Lars Kuchinke and Chris Biemann and Sascha Tamm and Arthur M. Jacobs In: Frontiers in Psychology, vol. 2, no. 252, p. 1--11, September 2011. http://www.frontiersin.org/language_sciences/10.3389/fpsyg.2011.00252/abstract. |
| Page 1 Page 2 Next > |