The Statistical Semantics research group examines statistical methods that reflect natural-language semantics. Specifically we compute semantic similarities and semantic relations between lexical items through the analysis of large texts. These relations are used in applications such as semantic indexing, paraphrasing and identification of lexical chains.
The focus of this group is on unsupervised and knowledge free methods (e.g., clustering of lexical graphs) or topic models. These methods, which neither presuppose training data nor assume the existence of knowledge resources, identify regularities in large text collections and mark them back into the data, following the structure discovery paradigm. This markup, which is entirely data-driven and therefore independent of domain and language, is then used as features for learning applications in supervised machine learning settings: the usefulness of structure discovery processes is assessed in an application-based manner.
For obtaining the markup necessary to train supervised language technology components, the group advocates the use of crowdsourcing techniques. Here, unskilled workers are paid small sums to perform small annotation tasks. The advantage of this is the virtually unlimited number of annotators, which makes the creation of training data quick and scalable. Quality is ensured by redundancy and by using qualification tests or test items. A major challenge lies in the formulation of complex annotation tasks needed as simple subtasks suitable for the crowd.
| Supervised All-Words Lexical Substitution using Delexicalized Features |
| György Szarvas and Chris Biemann and Iryna Gurevych In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), June 2013. |
| Exploring Cities in Crime: Significant Concordance and Co-occurrence in Quantitative Literary Analysis |
| Janneke Rauscher, Leonard Swiezinski, Martin Riedl, Chris Biemann In: Proceedings of the Workshop on Computational Linguistics for Literature, June 2013. |
| Three Knowledge-Free Methods for Automatic Lexical Chain Extraction |
| Steffen Remus and Chris Biemann In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), 2013. |
| Text: Now in 2D! A Framework for Lexical Expansion with Contextual Similarity |
| Chris Biemann, Martin Riedl In: Journal of Language Modelling, vol. 1, no. 1, 2013. |
| Text Segmentation with Topic Models |
| Martin Riedl and Chris Biemann In: Journal for Language Technology and Computational Linguistics (JLCL), vol. 27, no. 1, p. 47--70, August 2012. |
| TopicTiling: A Text Segmentation Algorithm based on LDA |
| Martin Riedl and Chris Biemann In: Student Research Workshop of the 50th Meeting of the Association for Computational Linguistics, p. 37--42, July 2012. |
| How Text Segmentation Algorithms Gain from Topic Models |
| Martin Riedl and Chris Biemann In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2012), p. 553--557, June 2012. |
| Page 1 Page 2 Next > |