Feature-based Visualization and Analysis of Natural Language Documents (VisADoc)

Motivation

The amount of digital text data, e.g. created by the Web users, has been rapidly growing over the recent years yielding heavy information overload. Search engines help the user to find the relevant documents, but do not provide advanced tools for analyzing and understanding the dimensions of text relevant to the users' needs.

The major challenge is the gap between automatically computable text features and the above mentioned needs, which have to be bridged to facilitate the user’s interaction with documents, e.g. understanding why two documents are similar, how the documents are related within an automatically computed cluster, or determining the relevant aspects of text quality and age suitability. VisADoc project aims at developing new visual analytic techniques for closing this gap.

Goals

  • Investigation of novel textual features for modeling content-related text properties
  • Development of an interactive feature engineering approach for complex user-defined semantic properties
  • Development of visual analysis tools that support the exploration of large document collections with respect to a certain text property

Methods

We analyze text according to different aspects determined through automatically computed features and an interactive, visually supported feature engineering approach which allows exploration and evaluation of user-defined text properties in large document collections. These features are then used for advanced text analysis, resulting in an improved effectiveness with higher accuracy. 

To this end, we investigate novel textual features for modeling content related text properties. A tight integration of automatic text analysis with multidimensional text and feature visualization is crucial to the proposed interactive process. The research is embedded in an end-to-end framework that supports defining text measures according to users interests.

Below are several examples of our visual semantic exploration of children books:

Flow of Harry's emotions in the chapters of the book Harry Potter and Sorcerer's Stone:

Distribution of Harry's activities (as verb semantic classes) in the book:

Analysis of readability difficulty per paragraph in each chapter of Harry Potter and the Sorcerer's Stone_

Position of activities (verb classes) represented as dense vectors (embeddings) in the semantic space:

Resources

Below, we make openly available some of the resources produced through this project.

Datasets:

 

 

 

Source code:

 

 

 

Other NLP resources:

Various:

Partners

This project is established in cooperation with University of Konstanz.

People

Project Publications

Additional Attributes

Type

Still not there? Comparing Traditional Sequence-to-Sequence Models to Encoder-Decoder Neural Networks on Monotone String Translation Tasks

Carsten Schnober, Steffen Eger, Erik-Lân Do Dinh, Iryna Gurevych
In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, p. 1703--1714, December 2016
The COLING 2016 Organizing Committee
[Online-Edition: https://github.com/UKPLab/coling2016-pcrf-seq2seq]
[Inproceedings]

Supersense Embeddings: A Unified Model for Supersense Interpretation, Prediction and Utilization

Lucie Flekova, Iryna Gurevych
In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Vol. Volume 1: Long Papers, p. 2029-2041, August 2016
Association for Computational Linguistics
[Online-Edition: https://github.com/UKPLab/acl2016-supersense-embeddings]
[Inproceedings]

Exploring Stylistic Variation with Age and Income on Twitter

Lucie Flekova, Daniel Preoţiuc-Pietro, Lyle Ungar
In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Vol. Volume 2: Short Papers, p. 313-319, August 2016
Association for Computational Linguistics
[Inproceedings]

Analysing Biases in Human Perception of User Age and Gender from Text

Lucie Flekova, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, Daniel Preoţiuc-Pietro
In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Vol. Volume 1: Long Papers, p. 843-854, August 2016
Association for Computational Linguistics
[Online-Edition: http://www.businessinsider.de/study-finds-words-associated-with-stereotypes-of-gender-age-politics-2016-11]
[Inproceedings]

A User Interface for the Exploration of Manually and Automatically Coded Scientific Reasoning and Argumentation

Patrick Lerner, Andras Csanadi, Johannes Daxenberger, Lucie Flekova, Christian Ghanem, Ingo Kollar, Frank Fischer, Iryna Gurevych
In: Proceedings of the International Conference of the Learning Sciences (ICLS) 2016, p. 938-941, June 2016
International Society of the Learning Sciences
[Online-Edition: https://reason.ukp.informatik.tu-darmstadt.de:9443/]
[Inproceedings]

Automatische Textanalysen in der Geschichtswissenschaft – Auswertung, Interpretation und Relevanz

Maik Fiedler, Andreas Weiß, Ben Heuwing, Carsten Schnober
In: DHd 2016, p. 126 -- 129, March 2016
nisaba verlag
[Online-Edition: http://dhd2016.de/]
[Inproceedings]

Personality Profiling of Fictional Characters using Sense-Level Links between Lexical Resources

Lucie Flekova, Iryna Gurevych
In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), p. 1805-1816, September 2015
Association for Computational Linguistics
[Online-Edition: https://www.ukp.tu-darmstadt.de/data/personality-profiling/]
[Inproceedings]

Constructive Feedback, Thinking Process and Cooperation: Assessing the Quality of Classroom Interaction

Tahir Sousa, Lucie Flekova, Margot Mieskes, Iryna Gurevych
In: Proceedings of INTERSPEECH Conference, p. 2739-2743, September 2015
[Online-Edition: https://github.com/UKPLab/jlcl2015-pythagoras]
[Inproceedings]

Analysing Domain Suitability of a Sentiment Lexicon by Identifying Distributionally Bipolar Words

Lucie Flekova, Eugen Ruppert, Daniel Preoţiuc-Pietro
In: Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, p. 77-84, September 2015
Association for Computational Linguistics
[Online-Edition: https://www.ukp.tu-darmstadt.de/data/sentiment-analysis/inverted-polarity-bigrams/]
[Inproceedings]

Feature-Based Visual Exploration of Text Classification

Florian Stoffel, Lucie Flekova, Daniela Oelke, Iryna Gurevych, Daniel Keim
In: Proceedings of the Symposium on Visualization in Data Science (VDS) at IEEE VIS 2015, 2015
IEEE
[Inproceedings]
A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang