This page describes criterias for creating an evaluation corpus for a document and ontology-based information system. == Possible domains == * [wiki:Evaluation/Corpus/OlympicGames2004] == Requirements == These are basic requirements * '''document corpus''' * single domain * different lengths (pages) * different types (news ticker, article, book, website) * at least 100 documents * different creation dates (time aware) * ''' thessaurus ''' * may link to wordnet * synonyms * akronyms * ''' ontology ''' * single namespace * annotations (synsets, ...) * ''' domain ontology ''' * describes the domain of the ''' document corpus ''' * contains taxonomy of classes * contains taxonomy of possible relations between classes * inverse relations are needed * OWL as language * named graphs as technique (reification) * allows creation of complex but speaking queries * ''' instance base ''' * contains annotations of document corpus * high density of relations between instances * high and uniform covering of classes and relations * each document is an instance