|
The DOGMA
Project at V.U.B. STAR Lab: linking ontologies, database semantics,
information systems, corporate knowledge management, agent technology, internet
databases, the semantic web, and quantum physics. Well, maybe not quantum
physics. Not yet, anyway.
©Robert Meersman
In the beginning there was database Databases have
become the hugely successful tools they are mostly because they implement
so-called "data independence", which for the sake of simplicity one
may call the ability to specify and manage data structures outside
application programs, and consequently to allow management of the (usually
large) "populations" of those data structures by specialized, highly
efficient software tools. Research and practice in databases have resulted over
time in techniques and methodologies for representing information in such data
structures that have become quite sophisticated, a fact sometimes overlooked by
other research communities. Object-oriented, object-relational and often even
"plain" relational database management systems (DBMS) come equipped
with a variety of syntactical constructs that permit database- and conceptual
schemas to represent objects, subtype taxonomies, and some integrity
constraints, derivation rules etc..
Domain knowledge vs. application knowledge A number of methodologies have been developed to assist
in the creation of such conceptual database schemas (or data models
are they are often imprecisely called), such as
EER [15], ORM,
UML,
each supported by a variety of so-called CASE
[1]
tools. It is important (even essential for our understanding
of the relevance of ontologies) to realize that DBMSs and associated
CASE environments are geared to providing software solutions a particular
application instance (e.g. an airline reservation system) rather
than for domains (e.g. air travel), although of course it will
in general be hard to formally define this distinction as domains may
be quite small and specialized while application suites may cover a
wide range of interrelated objects and functions. In this respect it
may be worth to note already that so-called ontologies (that
in a way should correspond to domain layer knowledge, see further) play
a different role in the scheme of things than application-specific data
models, an
observation not helped by the close resemblance of some ontology specification
languages in the literature to latter-day conceptual database schema
languages[1]...
The semantics of database Database semantics is the rich research area
covering all aspects of the relationship between the implemented database
system and the portion of reality it is supposed to render. Intuitively but
wholly informally, a higher quality of this rendering (i.e. the composition
of database schema, the database itself, and the application programs making it
available to users) is associated with more semantics, i.e. more of the
meaning of the domain of reality is represented in the database system.
Various formalisms exist to make this notion more precise, and this exactly is
the place where ontologies will enter the picture. The most common classic
formalism, also the most amenable to the use of ontologies, is so-called declarative
or Tarski semantics as may be found in various places in the database
and AI literature, as in Reiters seminal paper[2] or in the book by Genesereth
& Nilsson[3]. Essentially it replaces reality (the domain) by a conceptualization,
a mathematical object that typically consists of very elementary constructs
such as a set of objects and a set of (mathematical) relations. Semantics is
then formally defined simply as an interpretation mapping from the
system (or rather from the language describing a system instance in some
syntax) to this conceptualization. The role of agreement The elementariness of the conceptualization
constructs is essential, first of all to facilitate agreement about them but
also to achieve a semantics which is maximally independent of the chosen
database schema language and of the represented domain. It is indeed
fundamental to realize that all declarative semantics constitutes a form of agreement
(since at the very least users, domain experts and designers have to agree on a
chosen conceptualization). Since database systems are software solutions for a
particular application, such agreement has to be based on a common perception
of this applications domain. Databases typically do not provide schemas for
entire domains, i.e. they do not in general model reality itself. But if one
wants to achieve cooperation, interoperation or just communication between
database systems, some form of agreement naturally has to be established and
formalized about the underlying domain (reality). Suitably standardized (and
large) ontologies may provide a means for this. At long last, ontologies Starting from the almost classical definition of an ontology
(here used as a countable noun) by Thomas Gruber[11] as the specification of a
conceptualization, it therefore becomes straightforward to see pure
ontologies rather as mathematical objects, namely as the domain of the semantic
interpretation function under consideration. (Naturally, we shall ultimately have
to devise a suitable and convenient computer representation for them, but this
is an independent issue.) As argued above, it is therefore important to make
the elements of an ontology as simple as possible (even at the price of not
modeling a lot of the domains constraints, rules and other defining
properties. We conjecture that most often these properties anyhow will turn out
to be application-specific and therefore rather should be represented in
an appropriate layer surrounding the ontology. To make the distinction
explicit, we define an ontology base (or ontobase) as a large
set W of lexons, these being 4-tuples of
the form <g t1 r t2>
where g represents a context,
t1 and t2 are terms and r is a role.
The term t1 is called the headword of the lexon. The precise
definitions are left for a forthcoming more complete paper, but the intuition
should be fairly obvious. Some details may already be found in [4].
Syntax, semantics, pragmatics for ontologies In our formalism, an ontology is not a data
model; in fact the declarative semantics interpretation function maps data
models, together with the data collections described by them, into the ontology
(-base). The pragmatics of an ontology base is that it constitutes a set (in an
ideal situation, the set) of plausible elementary facts that within a
given context (e.g. a set of applications) may hold in the domain under study,
implying that no valid application (i.e. database system instance) should be
inconsistent with them, i.e. the interpretation of such application should satisfy
the ontology base in some well-defined sense. A promising initial formalization
of this concept may perhaps be derived from the work of Guarino[5]. Note that
we deliberately exclude derivation rules, constraints and the like from the
ontology base, thereby in some cases sacrificing the relative compactness
of an intensional representation for a
more extensional one but one, we claim, that is easier to agree on. Enter Dogma
Evidently
the construction (or should we say growing) of standardizable,
hence reusable and dependable computerized ontology bases will not
be a mean feat. In the DOGMA
[2]
project at
V.U.B. STAR
Lab) we are setting up an ontology server in order to assist
the gathering and incremental growth of sets of lexons. The use of
the extremely simple structure of lexons should we conjecture
achieve at least a degree of scalability not possible with
more comples representations hitherto used in the literature and practice.
This extensional approach will naturally lead to very large sets of
lexons, thereby moving the issue to matters of ontology organization,
rather than of representational power or sophistication (but to coin
a phrase, size itself is not a scalability issue, complexity is).
One important source of lexons coding domain-specific knowledge, as
opposed to generic ones occurring in general-purpose lexicons such
as Wordnet etc., will be formed by relational database schemas, yielding
an activity best described as ontology mining. By way of an
admittedly overly simplistic example, a lexon mined
from a relational table R(A1,
, An) could
be <g
R r Ai> where g
is the application context and r is a suitable role played
by attribute Ai in table R (more exactly, Ai
and R would map to ontology terms denoting the entities that participate
in a binary relationship
expressed by r). Interesting research issues about contextual layers
within ontologies arise here as one e.g. needs to separate local jargon
from common knowledge. Other important sources are numerous
existing thesauri and glossaries, for instance the elaborate SAP®
Glossary (for the crucial business process domain) in which each entry
however needs to be individually analyzed to extract its knowledge
structure (within DOGMA this is currently attempted experimentally
using a version of ORM[6]). The advantages for a more comprehensive
and consistent corporate knowledge management using such ontobases
should, however, already be quite obvious in spite of their simple
basic structure and organization. Dogma impact and ongoing research As the rationale behind the title and the description
of DOGMA hopefully indicate by now, the potential applications of ontologies
and of the results of the DOGMA project are legion. Within STAR Lab,
DOGMA research is already being applied in two European 5th
Framework projects, HyperMuseum
and NAMIC.
Further applications are new research projects in the area of digital
libraries such as IrisWeb.
Some ontology mining experiments (on existing databases, web pages,
and glossaries) are being conducted by students. For the
DOGMA Server, several deep theoretical problems need to be resolved,
such as finding the (best)
algorithms for adding new ontologies (i.e. sets of lexons). Earlier
work (not directly related to ontologies) by e.g. Weber[7] and Winslett[8]
on updating logical databases may prove relevant, as well as Gärdenfors
work[9] on belief revision or Vermeirs[10] on ordered logic; this
is merely a consequence of course of the fact that any computer representation
of an ontology is likely to behave as a collection of facts or predicates. Why Agents Well, why not. The architecture envisaged for the DOGMA Server currently is one of a mongrelized blackboard system surrounded by (client and system) agents. The agent paradigm was adopted among other reasons because it was felt to provide a conceptually pleasing and intuitive framework for software experimentation. Experimenting with new ontology engineering techniques in the context of agent applications indeed is an important secondary goal of DOGMA. Firstly, achieving semantically meaningful communication between agents is a fundamental application for ontologies, and the importance of solutions for this of course keeps increasing as networking becomes all-pervasive. And secondly, we plan to use simulation of ontology server usage patterns using distributed agents techniques; we expect this will lead to insights regarding best practice techniques and methodologies for ontology engineering, which in turn would help improve the applicability of DOGMA. Ontology Applications and the Future Finally, it is perhaps enlightening to see how
ontologies in a sense may achieve a form of semantics independence for
information- and knowledge based systems: just as database schemas achieved
data independence by making the specification and management of stored data
elements external to application programs, ontologies now will allow to specify
and manage domain semantics external to those programs as well. Exactly how
much knowledge is representable externally in this way will depend of course on
the extent of the ontobase and on the manner constraints, rules, and
application code make use of these knowledge elements. Conceivably at one point
it will become economical to actually enforce the building of
information systems, especially those destined for internet use or
interoperation, by prescribing the use of controlled vocabularies which
map explicitly to ontologies. Such vocabularies (including their rules of
semantically correct usage) may even become a strategic resource for an
organization, e.g. as part of a repository for corporate knowledge management. Student jobs and projects For students, there are projects, stages and thesis's available on DOGMA. If you are interested, please consult our proposals.
Current DOGMA work at STAR Lab If you are interested in the work that is currently done at STAR Lab on DOGMA, go to the Dogma Server Research. References and Literature [1]
ISO TR 9003: Concepts and Terminology of the Conceptual Schema and the
Information Base, ISO Technical Report, International Standards Organization,
1990. [2]
Reiter, R. Towards a Logical Reconstruction of Relational Database Theory,
in: Readings in AI and Databases, J. Mylopoulos and M.L. Brodie (eds.),
Morgan Kaufman, 1988. [3]
Genesereth, M. and Nilsson, N. Logical Foundations of Artificial
Intelligence, Morgan Kaufman, 1987. [4]
Meersman, R. Semantic Ontology Tools in Information Systems Design, in:
Proceedings of the ISMIS99 Conference, Z. Ras and M. Zemankova (eds.), LNCS,
Springer Verlag, 1999. [5]
Guarino, N. Formal Ontology and Information Systems, in: Proceedings of
FOIS98, N. Guarino (ed.), IOS Press, 1998. [6]
Halpin, T. Conceptual Schema and Relational Database Design, 2nd
Ed., Prentice Hall, 1996. [7]
Weber, A. Updating Propositional Formulas, in: Proceedings of the 2nd
Conference on Expert Database Systems, L. Kerschberg (ed.), Morgan Kaufman,
1986. [8]
Winslett, M. Updating Logical Databases, Cambridge Tracts in Theoretical
Computer Science no. 9, Cambridge University Press, 1990. [9]
Gärdenfors, P. (ed.) Belief Revision, Cambridge Tracts in Theoretical
Computer Science no. 29, Cambridge University Press, 1992. [10] Laenens, E,
Sacca, D., Vermeir, D., "Extending logic programming", in: Proceedings
of SIGMOD Conference, ACM Publications, 1990. [11] Gruber, T.: Towards Principles for the design of ontologies used for knowledge sharing. IJHCS, 43(5/6): 907-928, 1994 [12]
Sowa, J. "Knowledge Representation --Logical, Philosophical and
Computational Foundations", PhD thesis, V.U.B., 1999. [13]
Fowler, M. Analysis Patterns Reusable Object Models, Addison-Wesley, 1997. [14]
Casteleyn, S. Een uitbreiding van het Open Information Model als basis voor
het bouwen van ontologies
Related Projects and Relevant Links
[Ontolingua] http://www.ksl.stanford.edu/software/ontolingua/
[On-To-Knowledge] http://www.ontoknowledge.org
[Semantic Web] http://www.semanticweb.org/
[RDF] http://www.w3.org/RDF/
[DAML] http://www.daml.org/
[1] Computer Aided Software Engineering, a hot topic during the 80s [2] Developing Ontology-Guided Mediation of Agents; Agents are used as software paradigm; an agent incorporates its own (application) ontology that must lock into the requestors ontology (usually via a blackboard system for posting requests). Often this will require mediation, occasionally even by human intervention. |
|
|