JWPL

Lately, Wikipedia has been recognized as a promising lexical semantic resource. If Wikipedia is to be used for large-scale NLP tasks, efficient programmatic access to the knowledge therein is required.

JWPL (Java Wikipedia Library) is a open-source, Java-based application programming interface that allows to access all information contained in Wikipedia. The high-performance Wikipedia API provides structured access to information nuggets like redirects, categories, articles and link structure. It is described in our LREC 2008 paper

JWPL contains a Mediawiki Markup parser that can be used to further analyze the contents of a Wikipedia page. The parser can also be used stand-alone with other texts using MediaWiki markup.

Further, JWPL contains the tool JWPLDataMachine that can be used to create JWPL dumps from the publicly available dumps at download.wikimedia.org

In addition to that, JWPL now contains the Wikipedia Revision Toolkit, which consists of two tools, the TimeMachine and the RevisionMachine. The TimeMachine can be used to reconstruct a snapshot of Wikipedia from a specific date, or to create multiple snapshots from a time span. The RevisionMachine offers efficient access to the edit history of Wikipedia articles while storing the revisions in a dedicated storage format which decreases the demand of storage space by 98%. The toolkit is described in our ACL system demonstration paper.

JWPL on Google Code: jwpl.googlecode.com

 

If you use JWPL Core (API, DataMachine) in scientific work, please cite

@INPROCEEDINGS{ ZeschMuellerGurevych2008,
   author = {Torsten Zesch and Christof M{\"u}ller and Iryna Gurevych},
   title = {{Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary}},
   booktitle = {Proceedings of the Conference on Language Resources and Evaluation (LREC)},
   year = {2008}
}

If you use the Wikipedia Revision Toolkit in scientific work, please cite

@INPROCEEDINGS{FerschkeZeschGurevych2011,
   author = {Oliver Ferschke and Torsten Zesch and Iryna Gurevych},
   title = {Wikipedia Revision Toolkit: Efficiently Accessing Wikipedia's Edit History},
   booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:
      Human Language Technologies. System Demonstrations},
   year = {2011},
}

A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact
zum Seitenanfangzum Seitenanfang