Educational Monitoring on the web – Identifying and Following Educationally Relevant Controversies


The project aims at putting into practice a monitoring system for public opinion and argumentation on the most important controversially discussed educationally relevant topics that can be found on the internet.


The project focuses on this new quality dimension: educationally relevant controversies will are identified on the internet, e.g., from the press and in Social media, they will be further processed and provide within a new generated tool. Here, the central objective is to provide educational researchers with a systematic presentation of educationally relevant opinions observable in web-based sources, highlighting central arguments and trends as well as enabling spatial and timely tracking of arguments back to the contexts of their genesis. We will thus enable users to duly and exploratively identify educationally relevant controversies and discourse structures of central arguments, integrate these into research processes and track respective developments across space and time.


We approach the problem of argumentation mining from the information seeking perspective. The key sources are discussions (debates) about controversies (contentions) targeted at a particular topic which is of the user’s interest. The scope is not limited to a particular media type as the source types can range from the on-line newspapers’ editorials to user-generated discourse in social media, such as blogs and forum posts, covering different aspects of the issues. The main task is be to identify and extract the core argumentation and present this new knowledge to users. By utilizing argumentation mining methods, users can be provided with the most relevant information (arguments) regarding the controversy under investigation.


We conducted several extensive independent annotation studies. The central controversial topics were related to education. One distinguishing feature of educational topics is their breadth, as they attract researchers, practitioners, parents, or policy-makers.

  • We used the Claim-Premises scheme for annotating a dataset of web documents consisting of 80 documents from six current topics related to the German educational system

  • We annotated 990 English comments to articles and forums posts with their argumentativeness (persuasiveness). Then we applied the extended Toulmin’s scheme on 294 argumentative English comments to articles and forums posts and 57 English newspaper editorials and blog posts. The topics cover, e.g., mainstreaming, single-sex schools, or homeschooling, among others.


Classification of persuasive comments

We treat the problem of distinguishing between persuasive and unpersuasive documents as a binary machine learning task. For these experiments, we use all 990 English comments to articles and forums posts. Using a SVM-SMO classifier and a wide range of features (Lexical and surface features, POS features, Sentiment features, Topic model features, Deep learning features) we achieved 0.69 Macro F1 score.

Identification of argument components

Here we focus on automatic identification of argument components in the discourse. Our approach is based on supervised and semi-supervised machine learning methods on the gold data annotated with the Toulmin model. We cast the problem as sequence labeling. Using SVM-HMM and a variety of features (Structural, morphological, and syntactic features, Topic and sentiment features, Semantic, coreference, and discourse features, Deep learning features) we achieve Macro F1 score 0.23 over all 11 classes.

  • Reference: Argumentation Mining in User-Generated Web Discourse by Ivan Habernal and Iryna Gurevych. Computational Linguistics, 2016. In press.

Semi-Supervised methods for argumentation mining

Current approaches to automatic analysis of argumentation usually follow the fully supervised machine learning paradigm and rely on manually annotated datasets. To overcome the limited scope and size of the existing annotated corpora, we exploit debate portals—semi-structured discussion websites where members pose contentious questions to the community and allow others to pick a side and provide their opinions and arguments.
We proposed novel features that exploit clustering of unlabeled data from debate portals based on a word embeddings representation. Using these features, we significantly outperformed several baselines in the cross-validation, cross-domain, and cross-register evaluation scenarios.

  • Reference: Exploiting Debate Portals for Semi-supervised Argumentation Mining in User-Generated Web Discourse by Ivan Habernal and Iryna Gurevych. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), p. 2127-2137, Association for Computational Linguistics, September 2015.

Discriminating claims and premises in German argumentative discourse

We investigated the role of a large set of discourse markers in argumentative discourse based on a German dataset annotated with arguments (the Claim-Premise scheme, collection of 89 documents for 7 different controversial topics in educational domain) and identified semantic groups of discourse markers that are indicative of either claims or premises. These semantic groups also shed light on semantic aspects of claims and premises. Our classification model reaches 0.71 accuracy and shows that discourse markers are important features for the discrimination of claims and premises.

  • Reference: On the Role of Discourse Markers for Discriminating Claims and Premises in Argumentative Discourse by Judith Eckle-Kohler and Roland Kluge and Iryna Gurevych, In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), p. 2249-2255, Association for Computational Linguistics, September 2015.

We also experimented with a flat-scheme for annotating arguments in the same dataset, namely a flat annotations of Arguments by Polarity (Pro/Contra), Arguments by Argumentative Type (Qualitative/Quantitative), and Arguments by Reference (Referenced/Unreferenced) and reached 0.66 accuracy.

  • Reference: Vovk, Artem. 2013. Discovery and Analysis of Public Opinions on Controversial Topics in the Educational Domain, Master Thesis, Ubiquitious Knowledge Processing Lab, TU Darmstadt.


A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang