IT Forensics (as part of CASED)

Motivation

The police and other authorities are challenged by the new forms of communication in the Web 2.0, which are increasingly used for preparing, organizing, or committing crimes such as:

  • Sexual harassment of children
  • Distribution of illegal and dangerous materials
  • Planning of unauthorized demos, terror acts, etc.
  • Announcement of rampages and suicides
  • Weapon, drug, or human trafficking 

To make information on the Web manageable for manual inspection, we aim to research methods for processing natural language documents.

Goals

  • Create tools which aid in investigating crimes on the Web
  • Find relevant documents using a semantic search
  • Identify relevant information bits (persons, places, times)
  • Analyze the relations between them

Methods

The research of methods for analyzing material on the Web can be split up into three steps:
1. Data Acquisition: Crawling or creation of development data using the Web

  • Definition of relevant scenarios and data sources with support from the authorities
  • ISPs, social network providers etc. will assist in providing interfaces, metadata etc.
  • Cleaning and preprocessing, e.g. treatment of typos, slang... 

2. Data Analysis: Development/application of state of the art Natural Language Processing (NLP) techniques. Example Use: identification of key persons in an extremist forum, analysis of their relationships and the content of their posts.

  • Semantically enriched document retrieval
  • Keyphrase Extraction
  • Topic Clustering
  • Named Entity Recognition / Disambiguation
  • Relationship Extraction
  • Automatic Summarization

3. Presentation of Results: Development of user interfaces for:

  • Visualizing and highlighting relevant results
  • Interactive exploration of the result space
  • Assistance for transferring results into evidence usable in court

Academic Partners

People

Former staff:

  • Michael Matuschek, TU Darmstadt

Project Publications

Additional Attributes

Type

Adjacency Pair Recognition in Wikipedia Discussions using Lexical Pairs

Emily Jamison, Iryna Gurevych
In: Proceedings of the The 28th Pacific Asia Conference on Language, Information and Computing, p. 479--488, December 2014
Department of Linguistics, Chulalongkorn University
[Online-Edition: http://www.arts.chula.ac.th/~ling/paclic28/]
[Inproceedings]

Needle in a Haystack: Reducing the Costs of Annotating Rare-Class Instances in Imbalanced Datasets

Emily Jamison, Iryna Gurevych
In: Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing, p. 244--253, December 2014
Department of Linguistics, Chulalongkorn University
[Online-Edition: http://www.arts.chula.ac.th/~ling/paclic28/]
[Inproceedings]

Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads

Emily Jamison, Iryna Gurevych
In: Proceedings of 9th Conference on Recent Advances in Natural Language Processing (RANLP 2013), p. 327--335, September 2013
INCOMA Ltd.
[Inproceedings]

Supervised All-Words Lexical Substitution using Delexicalized Features

György Szarvas, Chris Biemann, Iryna Gurevych
In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), p. 1131-1141, June 2013
Association for Computational Linguistics
[Inproceedings]

Text Reuse Detection Using a Composition of Text Similarity Measures

Daniel Bär, Torsten Zesch, Iryna Gurevych
In: Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), p. 167-184, December 2012
[Inproceedings]

Combining query translation techniques to improve cross-language information retrieval

Benjamin Herbert, György Szarvas, Iryna Gurevych
In: Proceedings of the 33rd European Conference on Information Retrieval, Vol. 6611, p. 712-715, 2011
Springer
[InCollection]
A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang