SIR

Semantic Information Retrieval (SIR)

Integrating semantic relatedness into information retrieval to overcome the problem of term mismatch in query and documents.

 

Feel free to download our SIR Flyer

Motivation

An often occurring problem in information retrieval (IR) is the gap between the vocabulary used in formulating the user's information need (topic) and the vocabulary used in writing the documents of the collection to be queried. An example for this problem is the domain of electronic career guidance where an IR system helps young people to decide which profession to choose by automatically computing a ranked list of professions according to the user's interests. The IR system compares a short essay written by the user with descriptions of professions written by domain experts. Typically, people seeking career advice use different words for describing their professional preferences as those employed in the professionally prepared descriptions of professions. Therefore, lexical semantic knowledge and soft matching, i.e. matching semantically related terms, must be especially beneficial to such a system.

Goals

Improve the performance of IR on domain specific document collections:

  • increase recall (by closing the vocabulary gap)
  • increase precision (especially for the first 10 ranks)


Methods

  • Integrating semantic relatedness into IR models
  • Combining linguistic knowledge sources, e.g. German wordnet, and Web 2.0 knowledge sources, e.g. Wikipedia ==> broad coverage


System Architecture

Publications

Semantically Enhanced Term Frequency
Christof Müller and Iryna Gurevych:
In: Proceedings of the 32nd European Conference on Information Retrieval Research, p. (to appear), March 2010.

Wisdom of Crowds versus Wisdom of Linguists - Measuring the Semantic Relatedness of Words
Torsten Zesch and Iryna Gurevych:
In: Journal of Natural Language Engineering. to appear, vol. 16, 2010.

Approximate Matching for Evaluating Keyphrase Extraction
Torsten Zesch and Iryna Gurevych:
In: Proceedings of the 7th International Conference on Recent Advances in Natural Language Processing (electronic proceedings), p. 484--489, September 2009.

A Study on the Semantic Relatedness of Query and Document Terms in Information Retrieval
Christof Müller and Iryna Gurevych:
In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, p. 1338--1347, August 2009.

Semantic relations in a bilingual corpus of different registers
Oliver Čulo and Kerstin Kunz and Torsten Zesch:
In: Deutsche Gesellschaft für Sprachwissenschaft (DGfS) Workshop on Corpus, Colligation, Register Variation, March 2009.

Extracting Professional Preferences of Users from Natural Language Essays
Cigdem Toprak and Christof Müller and Iryna Gurevych:
In: Wolfgang Hoeppner: Tagungsband des GSCL Symposiums "Sprachtechnologie und eHumanities", p. 103-110, Abteilung für Informatik und Angewandte Kognitionswissenschaft Fakultät für Ingenieurwissenschaften Universität Duisburg-Essen, February 2009. ISSN 1863-8554.

Using Wikipedia and Wiktionary in Domain-Specific Information Retrieval
Christof Müller and Iryna Gurevych:
In: Carol Peters and Danilo Giampiccol and Nicola Ferro and Vivien Petras and Julio Gonzalo and Anselmo Penas and Thomas Deselaers and Thomas Mandl and Gareth Jones and Mikko Kurimo: Evaluating Systems for Multilingual and Multimodal Information Access -- 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19, 2008, Revised Selected Papers, Lecture Notes in Computer Science, vol. 5706, p. 219-226, Springer-Verlag GmbH, 2009.

Das World Wide Web als computerlinguistische Ressource
Iryna Gurevych:
In: Ralf Klabunde and Kai-Uwe Carstensen and Christian Ebert and Cornelia Endriss and Hagen Langer and Susanne Jekat: Computerlinguistik und Sprachtechnologie - Eine Einführung, p. (to appear), Springer Verlag, January 2009.

Putting the „Wisdom‐of‐Crowds“ to Use in NLP: Collaboratively Constructed Semantic Resources on the Web
Iryna Gurevych:
In: NSF sponsored symposium “Semantic Knowledge Discovery, Organization and Use”, November 2008.
http://nlp.cs.nyu.edu/sk-symposium/.

Graph-Theoretic Analysis of Collaborative Knowledge Bases in Natural Language Processing
Konstantina Garoufi and Torsten Zesch and Iryna Gurevych:
In: Proceedings of the Poster Session of the 7th International Semantic Web Conference, October 2008.

Representational Interoperability of Linguistic and Collaborative Knowledge Bases
Konstantina Garoufi and Torsten Zesch and Iryna Gurevych:
In: Proceedings of the KONVENS Workshop on Lexical-Semantic and Ontological Resources -- Maintenance, Representation, and Standards, October 2008.

Using Tag Semantic Network for Keyphrase Extraction in Blogs
Lizhen Qu and Christof Müller and Iryna Gurevych:
In: ACM 17th Conference on Information and Knowledge Management , p. 1381-1382, October 2008.

Using Similarity Measures for Context-Aware User Interfaces
Melanie Hartmann and Torsten Zesch and Max Mühlhäuser and Iryna Gurevych:
In: Proceedings of the 2nd IEEE International Conference on Semantic Computing, p. 190-197, August 2008.

Using Wikipedia and Wiktionary in Domain-Specific Information Retrieval
Christof Müller and Iryna Gurevych:
In: Francesca Borri and Alessandro Nardi and Carol Peters: Working Notes for the CLEF 2008 Workshop, September 2008.

Using Wiktionary for Computing Semantic Relatedness
Torsten Zesch and Christof Müller and Iryna Gurevych:
In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence, p. 861-867, July 2008.

Closing the Vocabulary Gap for Computing Text Similarity and Information Retrieval
Christof Müller and Iryna Gurevych and Max Mühlhäuser:
In: International Journal of Semantic Computing, vol. 2, no. 2, p. 253-272, June 2008.

Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary
Torsten Zesch and Christof Müller and Iryna Gurevych:
In: Proceedings of the 6th International Conference on Language Resources and Evaluation, May 2008.

Flexible UIMA Components for Information Retrieval Research
Christof Müller and Torsten Zesch and Mark-Christoph Müller and Delphine Bernhard and Kateryna Ignatova and Iryna Gurevych and Max Mühlhäuser:
In: Proceedings of the LREC 2008 Workshop 'Towards Enhanced Interoperability for Large HLT Systems: UIMA for NLP', p. 24-27, May 2008.

What to be? - Electronic Career Guidance Based on Semantic Relatedness
Iryna Gurevych, Christof Müller, Torsten Zesch:
In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, p. 1032--1039, Association for Computational Linguistics, June 2007.
http://www.aclweb.org/anthology/P/P07/P07-1130.

Cross-lingual Distributional Profiles of Concepts for Measuring Semantic Distance
Saif Mohammad and Iryna Gurevych and Graeme Hirst and Torsten Zesch:
In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), p. 571--580, June 2007.
http://www.aclweb.org/anthology/D/D07/D07-1060.

Darmstadt Knowledge Processing Repository Based on UIMA
Iryna Gurevych, Max Mühlhäuser, Christof Müller, Jürgen Steimle, Markus Weimer, Torsten Zesch:
In: Proceedings of the First Workshop on Unstructured Information Management Architecture at Biannual Conference of the Society for Computational Linguistics and Language Technology, April 2007.

Teaching "Unstructured Information Management: Theory and Applications" to Computational Linguistics Students
Iryna Gurevych, Christof Müller, Torsten Zesch:
In: Proceedings of the First Workshop on Unstructured Information Management Architecture at Biannual Conference of the Society for Computational Linguistics and Language Technology, April 2007.

Integrating Semantic Knowledge into Text Similarity and Information Retrieval
Christof Müller, Iryna Gurevych, Max Mühlhäuser:
In: Proceedings of the First IEEE International Conference on Semantic Computing (ICSC), p. 257-264, 2007.

Analysis of the Wikipedia Category Graph for NLP Applications
Torsten Zesch and Iryna Gurevych:
In: Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007), p. 1--8, April 2007.

Comparing Wikipedia and German Wordnet by Evaluating Semantic Relatedness on Multiple Datasets
Torsten Zesch and Iryna Gurevych and Max Mühlhäuser:
In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), p. 205--208, April 2007.

Analyzing and Accessing Wikipedia as a Lexical Semantic Resource
Torsten Zesch and Iryna Gurevych and Max Mühlhäuser:
In: Data Structures for Linguistic Resources and Applications, p. 197--205, Gunter Narr, Tübingen, April 2007.

Automatically creating datasets for measures of semantic relatedness
Torsten Zesch and Iryna Gurevych:
In: COLING/ACL 2006 Workshop on Linguistic Distances, p. 16--24, July 2006.

Exploring the Potential of Semantic Relatedness in Information Retrieval
Christof Müller, Iryna Gurevych:
In: LWA 2006 Lernen - Wissensentdeckung - Adaptivität, 9.-11.10.2006 in Hildesheim, vol. Hildesheimer Informatikberichte, p. 126-131, Universität Hildesheim, October 2006.

 

Software


Data

Teaching

In 2006 the SIR project team offered a Seminar on Unstructured Information Management at the University of Tübingen.

Partners

The Division of Computational Linguistics at the University of Tübingen is co-applicant of the SIR project. Their research focus is on further development of the GermaNet ontology using the BERUFEnet corpus.

In cooperation with the German Federal Agency for Employment (Bundesagentur für Arbeit), we employ semantic information retrieval algorithms to realize electronic career guidance. Using a natural language essay of the person seeking advice, relevant professions are found based on their natural language descriptions.

Funding

 

 

 

 

This project is funded by Deutsche Forschungsgemeinschaft (German Research Foundation).

People

A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang