Feature-based Visualization and Analysis of Natural Language Documents (VisADoc)


The amount of digital text data, e.g. created by the Web users, has been rapidly growing over the recent years yielding heavy information overload. Search engines help the user to find the relevant documents, but do not provide advanced tools for analyzing and understanding the dimensions of text relevant to the users' needs.

The major challenge is the gap between automatically computable text features and the above mentioned needs, which have to be bridged to facilitate the user’s interaction with documents, e.g. understanding why two documents are similar, how the documents are related within an automatically computed cluster, or determining the relevant aspects of text quality and age suitability. VisADoc project aims at developing new visual analytic techniques for closing this gap.


  • Investigation of novel textual features for modeling content-related text properties
  • Development of an interactive feature engineering approach for complex user-defined semantic properties
  • Development of visual analysis tools that support the exploration of large document collections with respect to a certain text property


We analyze text according to different aspects determined through automatically computed features and an interactive, visually supported feature engineering approach which allows exploration and evaluation of user-defined text properties in large document collections. These features are then used for advanced text analysis, resulting in an improved effectiveness with higher accuracy. 

To this end, we investigate novel textual features for modeling content related text properties. A tight integration of automatic text analysis with multidimensional text and feature visualization is crucial to the proposed interactive process. The research is embedded in an end-to-end framework that supports defining text measures according to users interests.

Below are several examples of our visual semantic exploration of children books:

Flow of Harry's emotions in the chapters of the book Harry Potter and Sorcerer's Stone:

Distribution of Harry's activities (as verb semantic classes) in the book:

Analysis of readability difficulty per paragraph in each chapter of Harry Potter and the Sorcerer's Stone_

Position of activities (verb classes) represented as dense vectors (embeddings) in the semantic space:


Below, we make openly available some of the resources produced through this project.





Source code:




Other NLP resources:



This project is established in cooperation with University of Konstanz.


Project Publications

Additional Attributes


Still not there? Comparing Traditional Sequence-to-Sequence Models to Encoder-Decoder Neural Networks on Monotone String Translation Tasks

Carsten Schnober, Steffen Eger, Erik-Lân Do Dinh, Iryna Gurevych
In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, p. 1703--1714, December 2016
The COLING 2016 Organizing Committee
[Online-Edition: https://github.com/UKPLab/coling2016-pcrf-seq2seq]

Analysing Biases in Human Perception of User Age and Gender from Text

Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, Daniel Preoţiuc-Pietro
In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Vol. Volume 1: Long Papers, p. 843-854, August 2016
Association for Computational Linguistics
[Online-Edition: http://www.businessinsider.de/study-finds-words-associated-with-stereotypes-of-gender-age-politics-2016-11]

A User Interface for the Exploration of Manually and Automatically Coded Scientific Reasoning and Argumentation

Patrick Lerner, Andras Csanadi, Johannes Daxenberger, Lucie Flekova, Christian Ghanem, Ingo Kollar, Frank Fischer, Iryna Gurevych
In: Proceedings of the International Conference of the Learning Sciences (ICLS) 2016, p. 938-941, June 2016
International Society of the Learning Sciences
[Online-Edition: https://reason.ukp.informatik.tu-darmstadt.de:9443/]

Automatische Textanalysen in der Geschichtswissenschaft – Auswertung, Interpretation und Relevanz

Maik Fiedler, Andreas Weiß, Ben Heuwing, Carsten Schnober
In: DHd 2016, p. 126 -- 129, March 2016
nisaba verlag
[Online-Edition: http://dhd2016.de/]

Constructive Feedback, Thinking Process and Cooperation: Assessing the Quality of Classroom Interaction

Tahir Sousa, Lucie Flekova, Margot Mieskes, Iryna Gurevych
In: Proceedings of INTERSPEECH Conference, p. 2739-2743, September 2015
[Online-Edition: https://github.com/UKPLab/jlcl2015-pythagoras]

Analysing Domain Suitability of a Sentiment Lexicon by Identifying Distributionally Bipolar Words

Lucie Flekova, Eugen Ruppert, Daniel Preoţiuc-Pietro
In: Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, p. 77-84, September 2015
Association for Computational Linguistics
[Online-Edition: https://www.ukp.tu-darmstadt.de/data/sentiment-analysis/inverted-polarity-bigrams/]

Document-level school lesson quality classification based on German transcripts

Lucie Flekova, Tahir Sousa, Margot Mieskes, Iryna Gurevych
In: Journal for Language Technology and Computational Linguistics, Vol. 30, p. 99-124, 2015
[Online-Edition: https://github.com/UKPLab/jlcl2015-pythagoras]

What Makes a Good Biography? Multidimensional Quality Analysis Based on Wikipedia Article Feedback Data

Lucie Flekova, Oliver Ferschke, Iryna Gurevych
In: Proceedings of the 23rd International World Wide Web Conference (WWW 2014), p. 855-866, April 2014
International World Wide Web Conferences Steering Committee
[Online-Edition: https://www.ukp.tu-darmstadt.de/data/quality-assessment/wikipedia-article-feedback/]

Can We Hide in the Web? Large Scale Simultaneous Age and Gender Author Profiling in Social Media - Notebook for PAN at CLEF 2013

Lucie Flekova, Iryna Gurevych
In: CLEF 2013 Labs and Workshops - Online Working Notes, September 2013
[Online-Edition: http://www.clef2013.org/index.php?page=Pages/proceedings.php]
A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang