IT Forensics (as part of CASED)


The police and other authorities are challenged by the new forms of communication in the Web 2.0, which are increasingly used for preparing, organizing, or committing crimes such as:

  • Sexual harassment of children
  • Distribution of illegal and dangerous materials
  • Planning of unauthorized demos, terror acts, etc.
  • Announcement of rampages and suicides
  • Weapon, drug, or human trafficking 

To make information on the Web manageable for manual inspection, we aim to research methods for processing natural language documents.


  • Create tools which aid in investigating crimes on the Web
  • Find relevant documents using a semantic search
  • Identify relevant information bits (persons, places, times)
  • Analyze the relations between them


The research of methods for analyzing material on the Web can be split up into three steps:
1. Data Acquisition: Crawling or creation of development data using the Web

  • Definition of relevant scenarios and data sources with support from the authorities
  • ISPs, social network providers etc. will assist in providing interfaces, metadata etc.
  • Cleaning and preprocessing, e.g. treatment of typos, slang... 

2. Data Analysis: Development/application of state of the art Natural Language Processing (NLP) techniques. Example Use: identification of key persons in an extremist forum, analysis of their relationships and the content of their posts.

  • Semantically enriched document retrieval
  • Keyphrase Extraction
  • Topic Clustering
  • Named Entity Recognition / Disambiguation
  • Relationship Extraction
  • Automatic Summarization

3. Presentation of Results: Development of user interfaces for:

  • Visualizing and highlighting relevant results
  • Interactive exploration of the result space
  • Assistance for transferring results into evidence usable in court

Academic Partners


Former staff:

  • Michael Matuschek, TU Darmstadt

Project Publications

Additional Attributes


Needle in a Haystack: Reducing the Costs of Annotating Rare-Class Instances in Imbalanced Datasets

Emily Jamison, Iryna Gurevych
In: Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing, p. 244--253, December 2014
Department of Linguistics, Chulalongkorn University

Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads

Emily Jamison, Iryna Gurevych
In: Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013), p. 327--335, September 2013

Supervised All-Words Lexical Substitution using Delexicalized Features

György Szarvas, Chris Biemann, Iryna Gurevych
In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), p. 1131-1141, June 2013
Association for Computational Linguistics
A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang