Home

Lexon Engineering


(Note: The work is a part of PRIME project deliverables written by Tang, Y and Spyns, P during 2004 till 2006)

We group under this heading these activities that transform the domain conceptualisation that, under the form of verbalised facts in natural language, is still independent of any ontology language or representation formalism. In the following steps, the informal conceptualisation is progressively transformed into formal statements, fitting the VUB DOGMA ontology engineering formal framework. Nevertheless, in our opinion these steps are also useful when implementing an ontology in RDF or OWL. RDF or OWL Lite statements could be a direct result of the lexon engineering stage as the format of the lexons is very close to RDF or OWL Lite statements. As OWL DL statements contain semantic constraints, we have to finish the application specification step before being able to produce OWL DL statements。

This stage consists of the formalisation of the verbalised facts followed by quality control checks. The terms and roles of the remaining lexons are linked to unambiguous definitions.

1.1.1.1.1 Create lexons

This activity uses the verbalised facts resulting from the previous step as input. The aim is to extract lexons. The results are represented as binary fact types or lexons in the form <γ, ti, ri-j, rj-i, tj> where the terms ti and tj refer to concepts and the roles ri-j and rj-i refer to the relationships by which these are related. Currently, the context γ refers to the particular section or input document from which the lexon has been extracted[1]. If the results of the previous activities are binary in nature (as stressed in previous sections), this exercise is significantly simplified.

Table 213 Privacy Directive Lexon Table[2]

Appropriate technical and organisational measures must be implemented to protect personal data against …

Article 17 of the Directive 95/46/EC

After Segmentation:

1. Appropriate technical measures must be implemented.

2. Appropriate organisational measures must be implemented.

3. Appropriate technical measures are to protect personal data.

4. Appropriate organisational measures are to protect personal data.

After highlighting:

1. Appropriate technical measures must be implemented.

2. Appropriate organisational measures must be implemented.

3. Appropriate technical measures are to protect personal data.

4. Appropriate organisational measures are to protect personal data.

Create Lexons:

ID

γ

ti

ri-j

rj-i

tj

1

PersonalDataProctect

TechnicalMeasure

BeImplemented

Implement

(Person/Machine/etc.)

2

PersonalDataProctect

OrganisationalMeasure

BeImplemented

Implement

(Person/Machine/etc.)

3

PersonalDataProctect

TechnicalMeasure

Protect

BeProtected

PersonalData

4

PersonalDataProctect

OrganisationalMeasure

Protect

BeProtected

PersonalData

1.1.1.1.2 Refine lexons

The lexons newly created should undergo a kind of quality check. We say that a lexon is a ‘good’ one when

  • This lexon is highly reusable
  • This lexon is as simple as possible
  • This lexon represents the correct information
  • This lexon cannot be broken down any more

Note that the creation of elementary sentences should almost automatically lead to good lexons. Nevertheless, it could happen that some elementary sentences are to be represented by more than one lexon and vice verse. It mostly concerns knowledge implied (but not explicitly mentioned) by an elementary sentence.

Table 214 shows an example based on the material from Table 213.
Table
214 Lexon refinement example
































S4

A data controller collecting data about a data subject. Many citizens desire not disclose their complete personal health information in an uncontrolled way. Accurate personal health data are crucial for high quality and personalised health care services but can also be misused to deny people services.

Segmentation of S4

S4.1

A data controller collecting data.

Lexon of S4.1


Γ

ti

r i-j

r j-i

tj

Original

Setting4.1 NSID:1

DataController

Collect

beCollected

Data

After simpli­fication

Setting4.1 NSID:1

Controller

Collect

beCollected

Data

Setting4.1 NSID:1

Controller

isAbout

appliedTo

Data

A term or role (or sometimes an expression) used in a specific context and language in principle points to a non ambiguous meaning. It can happen that equal or synonymous lexons or triples have been produced in the course of the domain conceptualisation and lexon engineering steps.

As it is senseless to keep such lexons (also possibly across language borders), the synonymous or equal cases are deleted in this activity. A refined lexon are voted to represent all the literally equal lexons (all composing words are the same) and synonymous lexons (the composing words are the same or synonymous). Two lexons or triples are equal when the respective terms and roles point to the same concepts (as have been defined in the previous step). This also holds when the sequence of terms and roles is inversed. In case of an inversed sequence, words can be antonyms.

Let’s look at three examples:

1. <bike, follow, be followed by, car> & <bicycle, go after, be followed by, automobile>

This example illustrates that two lexons are synonymous when their composing parts have the same sense. Here, bike is bicycle, follow is to go after, and a car is an automobile.

2. <dog, eat, be eaten, meat> & <meat, be eaten, eat, dog>

This sample illustrates that two lexons are equal when their terms and roles are equal (possibly in inverse sequence).

3. <bike, follow, be followed, car> & <automobile, precede, is preceded by, bicycle>

This example is the combination of two previous examples. The terms are synonyms and inversed while the role names are antonyms.

Simple sorting tools (e.g., ‘sort’, ‘unique’ Linux shell commands or spreadsheet functionalites) can already provide a basic level of automation. More sophistication can be achieved by implementing calls to on-line dictionaries, using the WordNet API, or e.g., using the DOGMA concept server[3] functionalities for checking on synonymy/antonymy.

As the core of this and the previous activities concerns the definition of meaning and linking the lexical representations to concepts, it is of primary importance that a sufficient number of stakeholders are gathered so that the refined lexons are based on widely accepted agreements.

In the case of the DOGMA engineering framework, the software allows to recreate[4] the lexons. Therefore, it is no longer necessary to store the lexons separately. However, for reasons of traceability and quality checks afterwards, it might be worthwhile to store the original lexons anyhow.

Here is an example from the E-Health NS - Table 27:

From the NS note, we can extract one lexon as: <NSID:1_note, patient, choose, isChosen, GP>. And from the Narratological Schema episode E1.1.2, another lexon is extracted as: <Narratological SchemaID:1 E1.1.2, user, choose, isChosen, GP>. We might recognize that those two lexons are equal because the user here means the patient[5].

1.1.1.1.3 Ground lexons

Lexon grounding is a conceptual exercise that links the terms and roles that constitute a lexon to existing dictionaries, lexica or standards. If no adequate definitions exist, then new definitions should be drafted by hand following terminological principles by the domain experts and other stakeholders. In this way the vocabulary of the ontology (in the form of terms and roles) is provided with semantics. As a result, synonyms are easily detectable (i.e. they point to the same definition). As a check or additional source, the list of synonyms created by using Abstraction mechanism (see section 2.2.5.2.2), if available, can be used. New labels have to be chosen for a set of synonyms or expressions having the same meaning. These labels are preferably (slightly) different from natural language words to indicate that they operate on the conceptual level rather than the language level. In the VUB DOGMA ontology engineering framework, the definitions, the labels and the synonyms are entered in the concept definition server. Other implementations offering a similar functionality can be envisaged.

The same example is continued:

Table 215 Lexon Dictionary

ID

Label

Explanation

1

Technical

description of software and hardware and the standards used

2

OrganisationalMeasure

Any manoeuvre that fits the organisational strategy made as part of progress toward a goal[6]

3

PersonalData

That data relating to a living individual which if in the possession of a data controller could by itself or with other data already in the possession of the data controller easily identify the living individual.

(from www.nhstayside.scot.nhs.uk/FoISA/Glossary.htm)

The following table shows several lexons extracted from Table 211.

Table 216 Settings of E-Health Narratological Schema

Setting

S1

Background on Ehealth

S2

The importance of privacy protection for Health data versus the importance of accurate data for health professionals particularly in emergencies.

S3

An ontology which allows personal devices to communicate health data accurately and also securely and to inform the user accurately about data processing.

S4

A data controller collecting data about a data subject. Many citizens desire not disclose their complete personal health information in an uncontrolled way. Accurate personal health data are crucial for high quality and personalised health care services but can also be misused to deny people services.

Segmentation of S4

S4.1

A data controller collecting data.

S4.2

Many citizens desire not disclose their complete personal health information in an uncontrolled way.

S4.3

Accurate personal health data are crucial for high quality health care services.

S4.4

Accurate personal health data are crucial for personalised health care services.

S4.5

Accurate personal health data can also be misused to deny people services.

Table 217 Lexon Table of E-Health Settings

ID

Context identifier

Term1

Role

Co-Role

Term2

1

Settings

NS

Contain

Is the component

background

2

Settings

Background

Is about

Applied to

E-Health

3

Settings

Emergency

Need

Is needed

Data

4

Settings

Privacy protection

protect

Is protected by

Data

5

Settings

Data

Is about

Applied to

Health

6

Settings

E-health ontology

Allow

Is allowed

Communication

7

Settings

Device

Is allowed

Communication

8

Settings

User

Get

Is granted to

Communication

9

Settings

E-health ontology

Inform

Is informed by

Processing

10

Settings

Processing

Is about

Is applied to

Data

11

Settings

E-health ontology

Inform

Is informed by

User

12

Settings

Data controller

Collect

Is collected by

Data

13

Settings

Citizen

Desire

Is desired

Information disclosure

14

Settings

Data

Is crucial for

Need crucially

Service

15

Settings

Service

Is about

Is applied to

Health care

16

Settings

Data

Is crucial for

Need crucially

Service

17

Settings

Service

Is about

Is applied to

Health care

18

Settings

Data

Is misused

19

Settings

People

Ask for

Is asked by

Service

20

Settings

Service

Is denied to

Is denied

People




[1] γ should be as specific and general enough to represent all those mentioned extracted lexons. If those lexons are captured from different but semantically related documents, γ will be chosen as general as enough to represent them. In that sense, the context represents an actual situation of usage able to disambiguate the word senses. Research on this point is still on-going.

[2] Just an example to show how γ is chosen for lexons

[3] It’s currently not publicly available.

[4] Synonymous and antonymous lexons can be reconstructed through the concept definition server.

[5] Abstraction happens during the whole domain conceptualization activity.

[6] Every label that appears in the lexon table should be found in the lexon dictionary, which is the reason why we include all those ‘privacy irrelevant’ labels here.