The DOGMA Project at V.U.B. STAR Lab: linking ontologies, database semantics, information systems, corporate knowledge management, agent technology, internet databases, the semantic web, and quantum physics. Well, maybe not quantum physics. Not yet, anyway.

©Robert Meersman


Note: Part of this material is based on the text for “ Can Ontologies learn from Database Semantics” contributed by the author to the forthcoming Dagstuhl Report on the March 2000 Seminar “Semantics for the Web”, D. Fensel , J. Hendler , H. Lieberman , W. Wahlster (eds.). All rights reserved.

In the beginning there was database

Databases have become the hugely successful tools they are mostly because they implement so-called "data independence", which for the sake of simplicity one may call the ability to specify and manage data structures outside application programs, and consequently to allow management of the (usually large) "populations" of those data structures by specialized, highly efficient software tools. Research and practice in databases have resulted over time in techniques and methodologies for representing information in such data structures that have become quite sophisticated, a fact sometimes overlooked by other research communities. Object-oriented, object-relational and often even "plain" relational database management systems (DBMS) come equipped with a variety of syntactical constructs that permit database- and conceptual schemas to represent objects, subtype taxonomies, and some integrity constraints, derivation rules etc.. 


Domain knowledge vs. application knowledge

A number of methodologies have been developed to assist in the creation of such conceptual database schemas (or “data models” are they are often imprecisely called), such as EER [15], ORM, UML, … each supported by a variety of so-called CASE [1] tools. It is important (even essential for our understanding of the relevance of ontologies) to realize that DBMSs and associated CASE environments are geared to providing software solutions a particular application instance (e.g. an airline reservation system) rather than for domains (e.g. air travel), although of course it will in general be hard to formally define this distinction as domains may be quite small and specialized while application suites may cover a wide range of interrelated objects and functions. In this respect it may be worth to note already that so-called ontologies (that in a way should correspond to domain layer knowledge, see further) play a different role in the scheme of things than application-specific data models, an observation not helped by the close resemblance of some ontology specification languages in the literature to latter-day conceptual database schema languages[1]...  


The semantics of database

Database semantics is the rich research area covering all aspects of the relationship between the implemented database system and the portion of “reality” it is supposed to render. Intuitively but wholly informally, a “higher” quality of this rendering (i.e. the composition of database schema, the database itself, and the application programs making it available to users) is associated with “more” semantics, i.e. more of the “meaning” of the domain of reality is represented in the database system. Various formalisms exist to make this notion more precise, and this exactly is the place where ontologies will enter the picture. The most common classic formalism, also the most amenable to the use of ontologies, is so-called declarative or Tarski semantics as may be found in various places in the database and AI literature, as in Reiter’s seminal paper[2] or in the book by Genesereth & Nilsson[3]. Essentially it replaces “reality” (the domain) by a conceptualization, a mathematical object that typically consists of very elementary constructs such as a set of objects and a set of (mathematical) relations. Semantics is then formally defined simply as an interpretation mapping from the system (or rather from the language describing a system instance in some syntax) to this conceptualization.

The role of agreement

The elementariness of the conceptualization constructs is essential, first of all to facilitate agreement about them but also to achieve a semantics which is maximally independent of the chosen database schema language and of the represented domain. It is indeed fundamental to realize that all declarative semantics constitutes a form of agreement (since at the very least users, domain experts and designers have to agree on a chosen conceptualization). Since database systems are software solutions for a particular application, such agreement has to be based on a common “perception” of this application’s domain. Databases typically do not provide schemas for entire domains, i.e. they do not in general “model reality itself”. But if one wants to achieve cooperation, interoperation or just communication between database systems, some form of agreement naturally has to be established –and formalized– about the underlying domain (“reality”). Suitably standardized (and large) ontologies may provide a means for this.

At long last, ontologies

Starting from the almost classical definition of an ontology (here used as a countable noun) by Thomas Gruber[11] as the specification of a conceptualization, it therefore becomes straightforward to see “pure” ontologies rather as mathematical objects, namely as the domain of the semantic interpretation function under consideration. (Naturally, we shall ultimately have to devise a suitable and convenient computer representation for them, but this is an independent issue.) As argued above, it is therefore important to make the elements of an ontology as simple as possible (even at the price of not modeling a lot of the domain’s constraints, rules and other “defining” properties. We conjecture that most often these properties anyhow will turn out to be application-specific and therefore rather should be represented in an appropriate layer “surrounding” the ontology. To make the distinction explicit, we define an ontology base (or “ontobase”) as a –large– set W of lexons, these being 4-tuples of the form <g t1 r t2> where g represents a context, t1 and t2 are terms and r is a role. The term t1 is called the headword of the lexon. The precise definitions are left for a forthcoming more complete paper, but the intuition should be fairly obvious. Some details may already be found in [4].


Syntax, semantics, pragmatics for ontologies

In our formalism, an ontology is not a data model; in fact the declarative semantics interpretation function maps data models, together with the data collections described by them, into the ontology (-base). The pragmatics of an ontology base is that it constitutes a set (in an ideal situation, the set) of “plausible” elementary facts that within a given context (e.g. a set of applications) may hold in the domain under study, implying that no valid application (i.e. database system instance) should be inconsistent with them, i.e. the interpretation of such application should satisfy the ontology base in some well-defined sense. A promising initial formalization of this concept may perhaps be derived from the work of Guarino[5]. Note that we deliberately exclude derivation rules, constraints and the like from the ontology base, thereby in some cases sacrificing the relative compactness of  an intensional representation for a more extensional one –but one, we claim, that is easier to agree on.



Enter Dogma

Evidently the construction (or should we say “growing”) of standardizable, hence reusable and dependable computerized ontology bases will not be a mean feat. In the DOGMA [2] project at  V.U.B. STAR Lab) we are setting up an ontology server in order to assist the gathering and incremental growth of sets of lexons. The use of the extremely simple structure of lexons should –we conjecture– achieve at least a degree of scalability not possible with more comples representations hitherto used in the literature and practice. This extensional approach will naturally lead to very large sets of lexons, thereby moving the issue to matters of ontology organization, rather than of representational power or sophistication (but to coin a phrase, size itself is not a scalability issue, complexity is). One important source of lexons coding domain-specific knowledge, as opposed to generic ones occurring in general-purpose lexicons such as Wordnet etc., will be formed by relational database schemas, yielding an activity best described as ontology mining. By way of an  –admittedly overly simplistic– example, a lexon mined from a relational table R(A1, …, An) could be <g R r Ai> where g is the application context and r is a suitable role played by attribute Ai in table R (more exactly, Ai and R would map to ontology terms denoting the entities that participate in a binary  relationship expressed by r). Interesting research issues about contextual layers within ontologies arise here as one e.g. needs to separate local jargon from “common knowledge”. Other important sources are numerous existing thesauri and glossaries, for instance the elaborate SAP® Glossary (for the crucial business process domain) in which each entry however needs to be individually analyzed to extract its knowledge structure (within DOGMA this is currently attempted experimentally using a version of ORM[6]). The advantages for a more comprehensive and consistent corporate knowledge management using such ontobases should, however, already be quite obvious in spite of their simple basic structure and organization.



Dogma impact and ongoing research

As the rationale behind the title and the description of DOGMA hopefully indicate by now, the potential applications of ontologies and of the results of the DOGMA project are legion. Within STAR Lab, DOGMA research is already being applied in two European 5th Framework projects, HyperMuseum and NAMIC. Further applications are new research projects in the area of digital libraries such as IrisWeb. Some ontology mining experiments (on existing databases, web pages, and glossaries) are being conducted by students. For the DOGMA Server, several deep theoretical problems need to be resolved, such as finding  the (best) algorithms for adding new ontologies (i.e. sets of lexons). Earlier work (not directly related to ontologies) by e.g. Weber[7] and Winslett[8] on updating logical databases may prove relevant, as well as Gärdenfors’ work[9] on belief revision or Vermeir’s[10] on ordered logic; this is merely a consequence of course of the fact that any computer representation of an ontology is likely to behave as a collection of facts or predicates.



Why Agents

Well, why not. The architecture envisaged for the DOGMA Server currently is one of a mongrelized blackboard system surrounded by (client and system) agents. The agent paradigm was adopted among other reasons because it was felt to provide a conceptually pleasing and intuitive framework for software experimentation. Experimenting with new “ontology engineering” techniques in the context of agent applications indeed is an important secondary goal of DOGMA. Firstly, achieving semantically meaningful communication between agents is a fundamental application for ontologies, and the importance of solutions for this of course keeps increasing as networking becomes all-pervasive. And secondly, we plan to use simulation of ontology server usage patterns using distributed agents techniques; we expect this will lead to insights regarding ‘best practice’ techniques and methodologies for ontology engineering, which in turn would help improve the applicability of DOGMA.



Ontology Applications and the Future

Finally, it is perhaps enlightening to see how ontologies in a sense may achieve a form of “semantics independence” for information- and knowledge based systems: just as database schemas achieved data independence by making the specification and management of stored data elements external to application programs, ontologies now will allow to specify and manage domain semantics external to those programs as well. Exactly how much knowledge is representable externally in this way will depend of course on the extent of the ontobase and on the manner constraints, rules, and application code make use of these knowledge elements. Conceivably at one point it will become economical to actually enforce the building of information systems, especially those destined for internet use or interoperation, by prescribing the use of controlled vocabularies which map explicitly to ontologies. Such vocabularies (including their rules of semantically correct usage) may even become a strategic resource for an organization, e.g. as part of a repository for corporate knowledge management.


Student jobs and projects 

For students, there are projects, stages and thesis's available on DOGMA.  If you are interested, please consult our proposals.


Current DOGMA work at STAR Lab

If you are interested in the work that is currently done at STAR Lab on DOGMA, go to the Dogma Server Research


References and Literature

[1] ISO TR 9003: Concepts and Terminology of the Conceptual Schema and the Information Base, ISO Technical Report, International Standards Organization, 1990.

[2] Reiter, R. “Towards a Logical Reconstruction of Relational Database Theory”, in: “Readings in AI and Databases”, J. Mylopoulos and M.L. Brodie (eds.), Morgan Kaufman, 1988.

[3] Genesereth, M. and Nilsson, N. “Logical Foundations of Artificial Intelligence”, Morgan Kaufman, 1987.

[4] Meersman, R. “Semantic Ontology Tools in Information Systems Design”, in: Proceedings of the ISMIS’99 Conference, Z. Ras and M. Zemankova (eds.), LNCS, Springer Verlag, 1999.

[5] Guarino, N. “Formal Ontology and Information Systems”, in: Proceedings of FOIS’98, N. Guarino (ed.), IOS Press, 1998.

[6] Halpin, T. “Conceptual Schema and Relational Database Design”, 2nd Ed., Prentice Hall, 1996.

[7] Weber, A. “Updating Propositional Formulas”, in: Proceedings of the 2nd Conference on Expert Database Systems, L. Kerschberg (ed.), Morgan Kaufman, 1986.

[8] Winslett, M. “Updating Logical Databases”, Cambridge Tracts in Theoretical Computer Science no. 9, Cambridge University Press, 1990.

[9] Gärdenfors, P. (ed.) “Belief Revision”, Cambridge Tracts in Theoretical Computer Science no. 29, Cambridge University Press, 1992.

[10] Laenens, E, Sacca, D., Vermeir, D., "Extending logic programming", in: Proceedings of SIGMOD Conference, ACM Publications, 1990.

[11] Gruber, T.: “Towards Principles for the design of ontologies used for knowledge sharing”. IJHCS, 43(5/6): 907-928, 1994

[12] Sowa, J. "Knowledge Representation --Logical, Philosophical and Computational Foundations", PhD thesis, V.U.B., 1999.

[13] Fowler, M. “Analysis Patterns –Reusable Object Models”, Addison-Wesley, 1997.

[14] Casteleyn, S. “Een uitbreiding van het Open Information Model als basis voor het bouwen van ontologies”

[15] Elmasri, R. and Navathe, S., "Fundamentals of Database Systems", 3rd ed., Addison-Wesley, 1999.


Related Projects and Relevant Links



[1] Computer Aided Software Engineering, a hot topic during the 80s

[2] Developing Ontology-Guided Mediation of Agents; Agents are used as software paradigm; an agent incorporates its own (application) ontology that must “lock into” the requestor’s ontology (usually via a blackboard system for posting requests). Often this will require mediation, occasionally even by human intervention.