WebAnno

 

WebAnno is a general purpose web-based annotation tool for a wide range of linguistic annotations. WebAnno offers annotation project management, freely configurable tagsets and the management of users in different roles. WebAnno uses technology from the brat rapid annotation tool for visualizing and editing annotations in a web browser. It supports annotation and visualization of arbitrarily large documents, pluggable import/export filters, the curation of annotations across various users, and farming out annotations to a crowdsourcing platform.

Currently, WebAnno allows POS, Named Entity, Dependency Parsing and co-reference resolution annotations. The architecture design allows to add additional modes of visualization and editing, when new kinds of annotations are to be supported.

The added value of WebAnno, as compared to previous annotation tools, is on the one hand constituted by its web-based interface to skilled as well as unskilled annotators, which unlocks a potentially very large workforce. On the other hand, the tool's support functionality for quality control and annotator management and curation lowers the entrance barrier for new annotation projects.

We created WebAnno to fulfill the following requirements:

  • Flexibility: Support of several annotation layers, several import and export formats, and extendibility to other frontends.
  • Web-based: Increased availability, distributed work, no installation effort.
  • Open Source: Usability of our tool in future projects without restrictions, under the Apache 2.0 license.
  • Quality and User Management: Integrated different user roles (administrator, annotator, and curator) support for several users, inter-annotator agreement measurement, data curation, and progress monitoring.
  • Interface to Crowdsourcing: unlocking a very large distributed workforce.
  • Pre-annotated and un-annotated documents: supporting new annotations, as well as manual corrections of automatic annotations.

Downloads

The source code is provided under the Apache Software License (ASL) version 2.

Publications

Natural Language Processing: Integration of Automatic and Manual Analysis

Author Richard Eckart de Castilho
Date 2014
Kind Phdthesis
LocationDarmstadt
KeywordsNLP infrastructure, automatic annotation, manual annotation, software engineering
KeyTUD-CS-2014-0872
Research Areas Ubiquitous Knowledge Processing, UKP_reviewed, UKP_s_CSniper, UKP_s_DKPro_Core, UKP_s_DKPro_Lab, UKP_s_WebAnno, UKP_p_TextAsInstance, UKP_p_DKPro, UKP_a_LangTech4eHum
Abstract <p>There is a current trend to combine natural language analysis with research questions from the humanities. This requires an integration of automatic analysis with manual analysis, e.g. to develop a theory behind the analysis, to test the theory against a corpus, to generate training data for automatic analysis based on machine learning algorithms, and to evaluate the quality of the results from automatic analysis. Manual analysis is traditionally the domain of linguists, philosophers, and researchers from other humanities disciplines, who are often not expert programmers. Automatic analysis, on the other hand, is traditionally done by expert programmers, such as computer scientists and more recently computational linguists. It is important to bring these communities, their tools, and data closer together, to produce analysis of a higher quality with less effort. However, promising cooperations involving manual and automatic analysis, e.g. for the purpose of analyzing a large corpus, are hindered by many problems:</p> <ul> <li>No comprehensive set of interoperable automatic analysis components is available.</li> <li>Assembling automatic analysis components into workflows is too complex.</li> <li>Automatic analysis tools, exploration tools, and annotation editors are not interoperable.</li> <li>Workflows are not portable between computers.</li> <li>Workflows are not easily deployable to a compute cluster.</li> <li>There are no adequate tools for the selective annotation of large corpora.</li> <li>In automatic analysis, annotation type systems are predefined, but manual annotation requires customizability.</li> <li>Implementing new interoperable automatic analysis components is too complex.</li> <li>Workflows and components are not sufficiently debuggable and refactorable.</li> <li>Workflows that change dynamically via parametrization are not readily supported.</li> <li>The user has no control over workflows that rely on expert skills from a different domain, undocumented knowledge, or third-party infrastructures, e.g. web services.</li> </ul> <p>In cooperation with researchers from the humanities, we develop innovative technical solutions and designs to facilitate the use of automatic analysis and to promote the integration of manual and automatic analysis. To address these issues, we set foundations in four areas:</p> <ul> <li>Usability is improved by reducing the complexity of the APIs for building workflows and creating custom components, improving the handling of resources required by such components, and setting up auto-configuration mechanisms.</li> <li>Reproducibility is improved through a concept for self-contained, portable analysis components and workflows combined with a declarative modeling approach for dynamic parametrized workflows, that facilitates avoiding unnecessary auxiliary manual steps in automatic workflows.</li> <li>Flexibility is achieved by providing an extensive collection of interoperable automatic analysis components. We also compare annotation type systems used by different automatic analysis components to locate design patterns that allow for customization when used in manual analysis tasks.</li> <li>Interactivity is achieved through a novel "annotation-by-query" process combining corpus search with annotation in a multi-user scenario. The process is supported by a web-based tool.</li> </ul> <p>We demonstrate the adequacy of our concepts through examples which represent whole classes of research problems. Additionally, we integrated all our concepts into existing open-source projects, or we implemented and published them within new open-source projects.</p>
Full paper (pdf)
[Export this entry to BibTeX]

Important Copyright Notice:

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang