wiki:Evaluation/Corpus/OlympicGames2004

DFKI/OCAS 2008 corpus

This page provides a subset of the OCAS 2008 corpus. The data can be downloaded here as zip file.

The archive is structured as follows:

  • annotations: RDF annotations about instances and facts of the ontology that were manually annotated in text.
  • ontology: The RDFS scheme and RDF instance base. It also contains a Protege 3.2 project file.
  • rdf: RDF annotations about instances and facts of the ontology that were automatically inferred by taking the manual annotations and the ontology as base.
  • txt: The text documents. Originally, these document were published by BBC and ABC. Please consider the copyright at the end of each text file.

Please refer to this publication when using this data set.

Grothkast, Alexander; Adrian, Benjamin; Schumacher, Kinga; Dengel, Andreas; Sebastian Blohm (Hrsg.); Ulf Brefeld (Hrsg.); Felix Jungermann (Hrsg.); Roman Yangarber (Hrsg.) OCAS: Ontology-Based Corpus and Annotation Scheme; Proceedings of the High-level Information Extraction Workshop 2008; This paper presents strategies and lessons learned from the creation of a corpus. It suggests a gold standard for evaluating ontology-based information extraction (OBIE) systems. This OBIE gold stan...

Other Links:

Last modified 8 years ago Last modified on 03/18/09 09:47:28