Open Mining INfrastructure for TExt and Data (OpenMinTeD)


Recent years witness an upsurge in the quantities of digital research data, offering new insights and opportunities for improved understanding. Text and data mining is emerging as a powerful tool for harnessing the power of structured and unstructured content and data, by analysing them at multiple levels and in several dimensions to discover hidden and new knowledge. Text mining solutions, however, are not easy to discover and use, nor are they easily combinable by end users.

OpenMinTeD aspires to enable the creation of an infrastructure that fosters and facilitates the discovery and use of text mining technologies and interoperable services. It examines several use cases identified by experts from different scientific areas, ranging from generic scholarly communication to literature related to life sciences, food and agriculture, and social sciences and humanities.

OpenMinTeD text mining tools, services and associated resources will run on the cloud, requiring an in-depth optimization of service deployment and execution via scalable VM-based service distribution and use of distributed storage.

The project runs 36 months from June 2015 to May 2018.


Through its infrastructural foresight activities, OpenMinTeD’s vision is to make operational a virtuous cycle in which:

  • primary content is accessible through standardised programmatic interfaces and access rules,
  • by well-documented and easily discoverable text mining services and workflows which process, analyse and annotate text to
  • identify patterns and extract new meaningful actionable knowledge, which will be used for
  • structuring, indexing and searching content, and, in tandem, e) act as a new knowledge resource useful for drawing new relations between content items and firing a new mining cycle.

UKP Lab leads the WP 5 "Interoperability framework", the task 5.2 "Infrastructure interoperability specifications" as well as the use-case task 9.4 "Social Sciences" and is further involved in WP 6 "Platform design", and WP 7 "Platform integration"

Target groups

  • End users who will consume TM services
  • Researchers, data base curators, …
  • Novice: use services to advance their science
  • Advanced: include TM services into more complex research workflows (SMEs).
  • Content and service providers that will provide their content and/or TM services for consumption
  • Publishers, libraries, scientific dbs, …
  • TM research communities
  • SMEs


Additional Attributes


A Legal Perspective on Training Models for Natural Language Processing

Richard Eckart de Castilho, Giulia Dore, Penny Labropoulou, Tom Margoni, Iryna Gurevych
In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), p. to appear, May 2018
European Language Resources Association (ELRA)

An Arranged Marriage: Integrating DKPro Core in the Language Analysis Portal

Milen Kouylekov, Emanuele Lapponi, Stephan Oepen, Richard Eckart de Castilho
In: Proceedings of the CLARIN Annual Conference 2017, p. online, September 2017

Representation and Interchange of Linguistic Annotation. An In-Depth, Side-by-Side Comparison of Three Designs

Richard Eckart de Castilho, Nancy Ide, Emanuele Lapponi, Stephan Oepen, Keith Suderman, Erik Velldal, Marc Verhagen
In: Proceedings of the 11th Linguistics Annotation Workshop (LAW XI) at EACL 2017, p. 67--75, April 2017
Association for Computational Linguistics

Automatic Analysis of Flaws in Pre-Trained NLP Models

Richard Eckart de Castilho
In: Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI3nOIAF2) at COLING 2016, p. 19--27, December 2016

Text mining resources for the life sciences

Piotr Przybyła, Matthew Shardlow, Sophie Aubin, Robert Bossy, Richard Eckart de Castilho, Stelios Piperidis, John McNaught, Sophia Ananiadou
In: Database, Vol. 2016, p. 1--30, November 2016


Funded by the EC under the H2020 Framework Programme for Research and Innovation.

Grant Agreement No. 654021, H2020-EINFRA-2014-2.

A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang