The need for exchanging and searching the data on the web led to standardize a universal language (XML) that allows developers to deliver the data from a wide variety of applications, XML provides syntax, not semantics, tags have no predefined meaning; XML by itself conveys only content and structure, not presentation, behavior, or meaning. Ontology is an interesting and rich method that can be used to carry the semantics of the XML, it defines richer relationships between different terms and establishes a joint terminology between members of a community of interest (human or automated agents). Some efforts in this research goes to build general ontologies in everywhere the ontological information can be found, e.g. XML schemes, databases schemes…etc.
Off course, one could invest thousands of manhours to construct an ontology from scratch. However, a vast amount of information is already available (e.g. on the internet), and the challenge is to extract ontological information from these existing sources. One can obviously think of online lexicons, dictionaries and encyclopedias containing useful information ; but also a lot of "hidden" information is present. For example, a lot of semantic information is required during the design phase of a database. With the help of sophisticated tools, it could be possible to extract this semantic information from the database schema, and construct an ontology with this information. Off course, database schema's are only one possible resource, others will also need exploration (e.g. structured websites, natural language texts, ..)
The thesis would consist of research work in this ontology mining field, ultimately developing a component within the DOGMA framework which enables (semi-)automatic extraction of ontological information from existing sources. The student is free to negotiate the nature of the existing source(s) with the contact person.
Java / XML / Eclipse 3.0 platform
Java, XML, Information retrieval techniques
Ruben Verlinden (ruben.verlinden at vub.ac.be)
Stijn Christiaens (stijn.christiaens at vub.ac.be)