Version 4 (modified by horak, 17 years ago) (diff) |
---|
This page describes criterias for creating an evaluation corpus for a document and ontology-based information system.
Possible domains
Requirements
These are basic requirements
- document corpus
- single domain
- different lengths (pages)
- different types (news ticker, article, book, website)
- at least 100 documents
- different creation dates (time aware)
- thessaurus
- may link to wordnet
- synonyms
- akronyms
- ontology
- single namespace
- annotations (synsets, ...)
- domain ontology
- describes the domain of the document corpus
- contains taxonomy of classes
- contains taxonomy of possible relations between classes
- inverse relations are needed
- OWL as language
- named graphs as technique (reification)
- allows creation of complex but speaking queries
- instance base
- contains annotations of document corpus
- high density of relations between instances
- high and uniform covering of classes and relations
- each document is an instance