| | Auditing description-logic-based medical terminological systems by detecting equivalent concept definitionsReceived 13 June 2006; received in revised form 20 June 2007; accepted 20 June 2007. published online 30 January 2009. Abstract ObjectiveTo specify and evaluate a method for auditing medical terminological systems (TSs) based on detecting concepts with equivalent definitions. This method addresses two important problems: redundancy, where the same concept is represented more than once (described by different terms), and underspecification, where different concepts have the same representation and hence appear indistinguishable from each other. DesignThe auditing method is applicable for TSs that are or can be represented in a description logic (DL). The method relies on the assumption that concept definitions are non-primitive (i.e. they are regarded as providing necessary and sufficient conditions). Whereas this assumption may not hold for many definitions, it does serve the purpose of detecting sets of logically equivalent concepts by a DL reasoner. Such a set may include the same concept which is defined more than once and/or different concepts that are underspecified as they appear indistinguishable from each other by their represented properties. Analysis of these sets provides insight into the representation quality of concepts and provides hints at improving the TS. MeasurementsIn our case study the method is applied to the DICE TS, a comprehensive TS in intensive care. It comprises about 2500 concepts and 40 properties and relations. ResultsIn DICE we found four concepts that were defined twice. Furthermore, 100 sets were found containing more than 300 underspecified concepts. The sizes of these sets ranged from 2 to 13. Analysis revealed that many concepts can be more completely defined, either by adding existing relations, or by the introduction of new relations into the terminological system. ConclusionThe method proved both usable and valuable for auditing TSs. DL reasoning is fully automated and all equivalent concept definitions are systematically found. The resulting sets of equivalent concepts clearly point out which concept definitions are to be reviewed, as they contain duplicate definitions of a concept, and (inherently or unnecessarily) underspecified concepts. 1. Introduction  Medical terminological systems (TSs) relate concepts in the domain of medicine among themselves and provides their terms and possibly their definitions and codes [1]. For example, in a TS a concept may be defined as “inflammation of the brain parenchyma”, and described by the synonymous terms “encephalitis” and “cephalitis”. TSs provide an invaluable source of (structured) medical knowledge, and have developed from single-purpose systems to systems serving a range of purposes, varying from recording patient information to providing decision support, and supporting epidemiological research and resource management. To address this broadening range of purposes, medical TSs have grown in size and complexity [2]. They evolved from simple taxonomies to semantic networks with (informal and formal) concept definition capability. Especially during the last decade, formal concept definition has gained increased attention. Such definitions are commonly represented using frames and description logics (DLs). Examples of frame-based TSs are the foundational model of anatomy (FMA) [3] and the gene ontology (GO) [4]. Examples of DL-based TSs are SNOMED-CT [5], the National Drug File Reference Terminology (NDF-RT) [6], and the NCI Thesaurus [7]. An important requirement for TSs is that the represented knowledge should be of good quality especially in terms of its internal consistency and faithfulness to reality. A formal representation provides explicit semantics of the represented knowledge, thus facilitating the determination of the internal consistency of this knowledge, and helping in checking its faithfulness. The process of quality assessment is called auditing. In the next section we will discuss a number of auditing approaches that have been designed and applied to ontologies in the field of medicine. In these approaches computational methods are used to focus the attention of a modeler to suspicious definitions, after which a modeler analyzes these definitions and eventually modifies the represented knowledge. We focus in this paper on the automatic discovery of equivalently defined concepts, which might correspond to duplicate concept definitions or underspecified concepts. Duplicate concept definitions are undesirable [8], and should not occur in a TS, because they may hamper querying a TS. For example, consider a TS that contains “myocardial infarction” and “heart attack” as separate concepts. One can assume (although this may give rise to a discussion) that these are actually synonyms, intended to refer to the same concept, which are used indiscriminately in everyday practice. If one then queries patients with heart attack, patients registered as having a myocardial infarction will not be returned. Underspecified concepts that appear indistinguishable from each other can be analyzed to determine whether it is possible to sharpen their definitions. Description logics form our representation and reasoning machinery. Although modeling knowledge based on description logic representation is often nontrivial [9] and may be restricted by the expressiveness of the logic, the representation has the major advantage of providing unambiguous definitions of concepts. Our goal is to explore the possibilities of deploying DLs in the audit of concept definitions in a TS. The DL family was chosen because its formal representation allows performing tractable automated reasoning on the represented knowledge. The prominent reasoning services are satisfiability testing (i.e. the logical consistency of represented knowledge) and subsumption (i.e. classification of a concept based on its properties). Logical consistency deals with the internal consistency of the represented knowledge, i.e., whether definitions are mutually consistent. It does not address whether represented knowledge is consistent with reality. Applying consistency testing was described in [10]. For the purpose of detecting equivalence we will use the DL reasoning service of subsumption testing. Concepts that are rendered equivalent by a DL reasoner can then be further analyzed. Whereas as DL reasoner will detect equivalence, it will not provide information on whether it is the result of a duplicate definition or of underspecification, hence this needs to be determined by human experts. If the logical equivalence is due to a duplicate definition of a concept, such a definition can be removed. If equivalence is caused by underspecification, it needs to be determined whether a more elaborate specification of at least one of the concepts can be given. We provide an overview of auditing approaches and an introduction to DL in Section 2 and then explain the used method in detail in Section 3. The results of the application of this method in a case study in the intensive care domain are discussed in Section 4. Section 5 summarizes the general results of the application of the method. Finally, conclusions are drawn in Section 6. 2. Background  Modeling large knowledge bases (such as terminological systems) and evaluating their contents are complicated processes. The need arises for systematic, reproducible methods to support these processes. Modeling and evaluating TSs concern various aspects, ranging from ontological decisions [11] to determining the comprehensiveness of the medical contents of a TS. Ideally, a knowledge base should satisfy four requirements [12]: (1) it should have the necessary knowledge (completeness), (2) the knowledge should be faithful to the real world (correctness), (3) the knowledge should not be self-contradictory (consistency), and (4) the system should have efficient algorithms to perform the inferences needed for the application (competence). Auditing is the process of assessing the fulfillment of (one or more of) these requirements. Satisfying all requirements may be hard, and striving for completeness and correctness may reduce the competence of the system. In this paper, we will not address the issue of competence. Completeness deals with representing knowledge relevant for the (purpose of the) ontology, rather than the impossibility of representing “everything”. After discussing a number of auditing approaches, we will introduce description logics and then address the use of description logics in medicine. 2.1. Auditing approaches During the last decade, various techniques have been applied for auditing medical TSs. In [13], so-called “semantic methods” are applied for the detection of ambiguity, redundancy of concept pairs, inconsistency of parent–child relationships, and lack of semantic links. These methods make use of synonyms and of semantic types that are assigned to concepts. In [14], methods for finding missed synonymy are described, based on lexical techniques and the use of synonymous words and phrases. In [15], a technique is presented to audit concept categorizations (i.e. the assignment of one or more semantic types to a concept), based on expert review of intersections of semantic types. In [16], “semantic refinement” is presented, which helps detection of ambiguity, non-uniform classification, classification errors, omissions, redundant classification and missed synonymy. This method also makes use of semantic types. In [17] the use of Protégé Axiom Language (PAL) queries in Protégé is described for the detection of redundantly defined is_a relations. The same environment is used in [18] for the purpose of detecting constraint violations, such as concepts that have multiple preferred terms in a language, whereas exactly one preferred term is required. In [19], a quality control algorithm is mentioned that analyses the distribution of entity-characteristics. This algorithm, which is implemented in a proprietary ontology management tool, LinKFactory®, proposes new entities to be added. Inconsistencies can be then detected by manual inspection of these proposed new entities. In [20], two algorithms (lexical comparison and classification) are combined to detect (among others) improper assignment of relationships, redundant concepts, and omission of relationships. Van Buggenhout and Ceusters [21] describe an information content algorithm, that can also be used for quality control. Recently, auditing on the basis of the principles underlying a formal ontology has been described in [22]. In many of these approaches the modeler eventually does a manual interpretation of parts of the represented knowledge, where the computational methods help focus attention on possible errors or flaws. In our approach the interpretational burden is further shifted towards the method itself by means of automated reasoning. Another difference of our approach is that we apply off-the-shelf reasoning applications that provide sound and complete algorithms. Using these applications, potential duplicates and underspecification are automatically detected whereupon the modeler has to decide whether they constitute actual duplicate definitions or underspecified concepts, and act accordingly. 2.2. Description logics Description logics (DLs) provide fragments of first-order logic for formal definition of concepts. These definitions can specify either only necessary conditions or both necessary and sufficient conditions. Definitions with only necessary conditions are indicated by the notion “primitive definitions” [23, Chapter 9](by others also called “specialization” or “partial class”). Primitive definitions are indicated by the “subsumed by” symbol: . Definitions with both necessary and sufficient conditions are referred to as “non-primitive definitions” (by others also called “definition” or “complete class”) and indicated by the “equivalence” symbol: . As an example of a non-primitive definition, axiom 1 in Fig. 1 states that every inflammatory disease is necessarily and sufficiently a disease in which some inflammation is involved. This implies that every disease that involves an inflammation can be inferred to be an inflammatory disease. Axiom 5 in Fig. 1 shows an example of a primitive definition: a ViralHepatitis1 is an inflammatory disease that is located in the liver (and maybe of other body parts). However, it cannot be inferred that every inflammatory disease of the liver is a ViralHepatitis1 (as it might have a non-viral cause, for example, a bacterium). Each DL is characterized by the constructors it allows for. Examples of concept constructors are AND ( ), OR ( ), NOT ( ), SOME ( ), ALL ( ), AT-LEAST ( ). The formal, set-theoretic semantics of DLs provide DL statements with an unequivocal meaning, although these statements are restricted by the expressiveness of the underlying DL. The foremost reasoning tasks with DLs are satisfiability testing and subsumption (classification). Satisfiability testing is checking whether a concept expression does not necessarily denote the empty concept [23]. Subsumption testing amounts to checking whether one concept is more general than another. Subsumption can be inferred by virtue of non-primitive definitions only, as these specify both necessary and sufficient conditions, as explained above. The computational complexity increases with the expressiveness of a DL. Generally, reasoning with inexpressive DLs is tractable, whereas reasoning with very expressive DLs can become intractable, and even undecidable. 2.3. Description logics in medicine It is argued that in the domain of medicine many natural kinds (such as rheumatoid arthritis) exist, for which no necessary and sufficient conditions exist [24]. These natural kinds are recognized rather than inferred. As these natural kinds can only be defined in a primitive manner (as is stated among others by [25]), much of the inferential potential is lost, as no concept can be inferred to be subsumed by a concept with a primitive definition. Apart from natural kinds that result in primitive concept definitions, there may be other reasons why concepts are defined as primitive, for example, the expressive power of DL can be too limited to express the necessary and sufficient conditions. In contemporary terminological systems the majority of concept definitions are primitive. In both SNOMED CT (July 2005 edition) and the NCI Thesaurus (release 05.05d) non-primitive definitions amount to only 11% of the total number of concepts. The large number of primitive concepts reduces the possibility of using standard DL-based reasoning for finding concepts that are defined more than once. Before we present a proposal to overcome this, we will first give an example to demonstrate this. Fig. 1, Fig. 2 provide an example of the role that non-primitive and primitive definitions play in modeling terminological systems. Non-primitive definitions facilitate the inference of classification (subsumption) of concepts. These definitions also make it possible to detect equivalent definitions of concepts (by means of detecting mutual subsumption). For example, in Fig. 1, the first definition states that an InflammatoryDisease is a disease that involves an inflammation. As this is a non-primitive definition, these are necessary and sufficient conditions. Hence, any disease that involves an inflammation is inferred to be an InflammatoryDisease. Likewise for definition 2, a disease that is located in the liver is a LiverDisease. Hepatitis1 and Hepatitis2 (definitions 3 and 4) can be inferred to be equivalent by a Description Logic Reasoner, such as Racer1, Fact++2 or Pellet3. It is then up to the modeler to decide whether Hepatitis1 and Hepatitis2 are actually duplicate definitions of one concept, or different concepts which are equivalently defined. However, when definitions are primitive, as those of ViralHepatitis1 and ViralHepatitis2 (definitions 5 and 6), equivalence will not be inferred, and a possible duplicate definition remains undetected. Another pitfall of primitive definitions is that they can lead to missed classification. For example, given definition 7, ViralHepatitisA would be correctly classified as a child of Hepatitis1 and Hepatitis2, but not as a child of ViralHepatitis1 or ViralHepatitis2, although it should have been. 3. Method  Although it is inevitable to have many primitive definitions in a medical TS, the examples in Section 2 demonstrate the potential of exploiting the inferential powers of DL reasoners in the modeling process by stating the non-primitivity of all relevant concept definitions. Description logics have previously been applied for auditing a TS. The main distinction between previous work and the method we describe and apply is the assumption of non-primitivity for concepts that would normally be defined as primitive. One needs to determine for which concepts this assumption can be made. Simply defining all concepts as non-primitive will lead to results that are very hard to interpret. Therefore, the method for determining equivalent definitions comprises of the following steps, which we describe below: determine concepts of interest, exclude poorly defined concepts, assume non-primitive definitions, infer equivalence, interpret the results. 3.1. Determining concepts of interest The first step is to determine which concept category is to be audited. Generally, medical TSs can be regarded as consisting of various (more or less explicitly distinguishable) modules [26], [27]. For example, SNOMED CT specifies not only concepts in the category “disease”, but, among others, also the categories “body structure”, “finding”, “organism”, “specimen”, and “substance”. These modules are used for the definition of disease concepts, as is also shown in the examples in Fig. 1, Fig. 3. The need to determine which category will be investigated is driven by the fact that equivalence of concepts can propagate, leading to equivalence of other concepts. An example of this situation is presented in Fig. 3. If we would apply the method to diseases as well as microorganisms, virus and bacterium would be rendered equivalent, as their definitions are the same. This in turn would lead to equivalence of ViralPneumonia and BacterialPneumonia, due to their reference to respectively virus and bacterium. Hence, one either focuses on microorganisms, which will point out the equivalence of virus and bacterium, or on diseases, in which case ViralPneumonia and BacterialPneumonia will correctly be regarded non-equivalent. 3.2. Exclusion of poorly defined concepts The next step is to find all concepts that are subsumed by one concept and do not specify any difference with their subsumer. In Fig. 3, pneumococcal pneumonia and staphylococcal pneumonia are examples of such concepts. For these concepts, changing their definitions to non-primitive definitions will provide a trivial equivalence of the concept and their subsumer. As concepts of this form are easily recognizable, they can be studied separately. An analysis on SNOMED CT [28] showed that in some modules (e.g. “organism” and “substance”) there was not a single concept that specifies any difference with its direct subsumer, whereas in other modules the proportion of concepts that explicitly specify differences with their subsumer is up to 86% (e.g. “specimen”). 3.3. Assuming non-primitive definitions We can now redefine all other concepts (i.e. those that are subsumed by more than one concept or show differences with their subsumer(s)) as non-primitive. In Fig. 3, the definitions for pneumonia and pulmonary edema are changed from being considered primitive definitions to become non-primitive definitions. ViralPneumonia and BacterialPneumonia, which already had a non-primitive definition, remain unchanged. 3.4. Inference of equivalence When the TS has been altered according to the steps mentioned above, it can be classified with a DL reasoner. This classification will result in sets of equivalent concepts. These sets can then be further analyzed. Classification of the example system from Fig. 3 will render pneumonia and pulmonary edema equivalent. 3.5. Interpretation of equivalence At this stage it is up to the modeler to analyze the equivalences. This analysis will provide two types of outcomes. First, it will reveal concepts that have duplicate definitions, which were previously undetected due to the fact that definitions were primitive, analogous to the example of ViralHepatitis1 and ViralHepatitis2 in Fig. 1. Second, it will reveal concepts that are different (as in the above example of pneumonia and pulmonary edema), but for which the distinction between them is not represented. In the latter case, which we refer to as underspecification, the TS can potentially be enriched by making explicit the implicit knowledge that distinguishes one concept from another. When this distinction cannot be made explicit, it is due to the lack of characteristic features of the concept (i.e. it is a natural kind), or due to limitations of the DL used. 4. Case study  We apply the method described in Section 3 to DICE4[29], a medical TS on reasons for admission in intensive care. The DICE knowledge base, developed at the authors’ institution, contains about 2500 concepts. Each concept is described in both Dutch and English by a preferred term, and any synonym(s) for both languages. In addition to reasons for admission, DICE contains concepts regarding anatomy, etiology and morphology. DICE was represented using the KRSS syntax [30]. As an example of this syntax, the definition for Pneumonia in Fig. 3 is represented as: (define-primitive-concept Pneumonia (AND Disease (SOME location Lung))). In non-primitive definitions the term define-concept is used instead of the term define-primitive-concept. 4.1. Determining concepts of interest We have focused our evaluation on the reasons for admission taxonomy, and did not yet analyze the other taxonomies, such as anatomy and etiology. Focusing on the reasons for admission is motivated by the fact that these are the central concepts in DICE, and we want to ensure that these are defined as complete as possible. DICE contains 1456 reasons for admission. 4.2. Exclusion of poorly defined concepts The use of the KRSS syntax made it straightforward to detect all concepts that are subsumed by one concept and do not specify any difference with their subsumer. A text-based search in the KRSS file results in all definitions that contain exactly two concept names (i.e. the concept being defined and its subsumer), and no constructors (e.g. “AND” or “SOME”). These definitions are assumed to be primitive definitions. One hundred and six concept definitions (7% of all reasons for admission) were found in this step. 4.3. Assuming non-primitive definitions The remaining 1350 concept definitions (93% of all reasons for admission) were assumed to be non-primitive. 4.4. Inference of equivalence RACER was used to classify the resulting terminological system. As a result of this classification it was determined that 1006 concepts (75% of the 1350 non-primitive concept definitions) had a unique definition, and 344 concepts (25%) had definitions that were logically equivalent to those of other concepts. These 344 definitions originated from 121 concept definitions that occurred twice or more, as is shown in Table 1. There were 74 sets of two equivalent definitions, and one set of 13 concepts with an equivalent definition. 4.5. Interpretation of equivalence As explained in Section 3, there can be various explanations for concept equivalence. Equivalent concepts need to be analyzed to determine whether they are actually duplicately defined, or underspecified. Such underspecification can be inevitable when concepts are natural kinds, or when concept properties cannot be expressed due to limitations of the used DL. Avoidable underspecification concerns tacit knowledge, which could be made explicit by enhancing concept definitions, in order to make the definitions more complete. An in-depth description of the outcomes of analysis of equivalent definitions within DICE is beyond the scope of this paper and of limited relevance. We will here discuss only some illustrative outcomes. Four sets of concept definitions were found in DICE that represented duplicates. These were: {hemothorax, hemopleura}, {morbus Plummer, nodular toxic goiter}, {Guillain-Barré syndrome, inflammatory demyelinating polyradiculoneuropathy}, and {pneumonectomy, lung excision}. These examples show that lexically very different synonyms easily remain unrecognized as such by modelers and therefore such synonyms can easily introduce duplicate concept definitions. If equivalence does not concern duplicate definitions of the same concept, it reveals concepts that differ in meaning in a way that is not represented in the knowledge base. These concepts need to be analyzed to determine whether it is possible to express the distinction between them. In DICE, 15 concepts were pointed out by human experts as natural kinds, which were to a large extent syndromes and/or eponyms. Examples of these are “adult respiratory distress syndrome”, and “Wolff-Parkinson-White syndrome”. Some concepts revealed underlying semantics that could not be expressed using the representation of DICE. DICE originally had a frame-based representation, and has been migrated to DL to enable performing the experiments described. Five concepts were found that explicitly mentioned negation, which cannot normally be represented using frames. Examples of these are “bleeding” versus “non-bleeding” and “obstructive” versus “non-obstructive”. As this difference could be explicitly represented using a DL that allows for negation, it needs to be determined whether the use of a more expressive DL outweighs any increase in computational complexity (hence the need for balance between the requirements of “completeness” and “competence” as defined in Section 2). The remainder of the concepts that were primitively defined or that were non-uniquely defined, demonstrated underspecification that seemed to be relatively easy to avoid. This means that it is possible and appropriate to extend the definitions by adding conditions. Firstly, there were many concepts that can be refined using roles and role values that are already available in the knowledge base. For example, in DICE a role “etiology” and a concept “meningococcus” do exist. However, “meningococcal meningitis” was primitively defined as a “bacterial meningitis”, without defining its etiology. Making this definition (more) complete is straightforward and required, as one wants to be able to infer that meningococcal meningitis is a disease that is caused by meningococci. Secondly, concepts indicated the need for additional roles and role values that were not readily defined in DICE. For example, hypocalcemia and hypercalcemia can be distinguished by making explicit the “level” involved; respectively, “below normal” and “above normal”. To this end, the role “level” and relevant role values must first be defined in DICE, and then be added to the definitions of hyper- and hypocalcemia. Likewise, to distinguish hypercalcemia and hypermagnesemia the involved chemical elements (respectively, calcium and magnesium) need to be specified, which are not yet defined in DICE. These examples demonstrate that first the knowledge base requires extensions (as chemical elements and levels are currently not defined in the knowledge base), after which the concepts can be more completely defined. 5. Lessons learned  The case study shows that there are five typical causes that result in concepts with equivalent definitions. These situations are summarized below, ordered by the increasing effort needed for overcoming the potential modeling weaknesses that equivalence brought to light. For example, truly duplicate definitions can be relatively easily fixed, whereas there is no easy fix for definitions of concepts that represent natural kinds. (1)Concepts are duplicately defined. If this is the case, all but one of the duplicate concepts can be made obsolete, and the terms attached to the obsolete concepts can be added as synonym terms of the retained concept. For example, the concept hemopleura can be made obsolete, the term “hemopleura” is then added to the concept “hemothorax” as a synonymous term. (2)Concepts can be distinguished by roles and role values that are readily present in the TS. In this case, the roles and appropriate values can be added to the concept to make the distinction explicit. For example, the “etiology” “meningococcus” should be specified for the concept “meningococcal meningitis”. (3)Concepts can be distinguished by roles or role values, but these are not yet present in the TS. In this case, the appropriate roles and role values must be added to the TS, and related to the concepts. For example, chemical elements (e.g. calcium and magnesium) and a role “involves chemical” can be defined in DICE, and added to the definitions of the concepts to enable distinguishing hypercalcemia from hypermagnesemia. Additionally, one can search the TS for other concepts to which the roles and role values can be added, e.g. hypercalciuria. (4)Concepts can in principle be distinguished but due to limitations of the underlying formalism this cannot be expressed. In this case, a more expressive formalism can be considered, but the practical and computational consequences must be taken into account. Based on the outcome of this analysis, either a more expressive formalism can be introduced, or the limitations, which then have been clearly identified, are to be accepted. For example, the formalism of DICE can be extended with negation to distinguish “bleeding” from “non-bleeding”. (5)Concepts represent natural kinds, hence their definition cannot be expressed in any formal way. In this case, due to a lack of knowledge it is not possible to define necessary and sufficient conditions for a concept without resorting to metaphysics (e.g. introduction of the property of “dogness” to uniquely define a dog). However, an attempt can be made to define all necessary conditions (if any) in order to define the concept as precise as possible. 6. Discussion and conclusions  We have specified a method for auditing medical terminological systems based on detecting concepts with equivalent definitions. We applied this method to a terminological system regarding reasons for admission in intensive care. Before drawing conclusions about the applicability of this method, we will discuss the impact of underspecification on practical use of a terminological system. Furthermore, we will discuss generalizability of the results of our case study by comparing the results with an additional but small case study on the Foundational Model of Anatomy [3]. 6.1. Impact on practical use So far we have only touched on the potential problems that duplicate definition and underspecification may cause when using a terminological system in real practice. Now that we have discussed the possibilities of DL-based reasoning we review these problems. In DICE, 344 concepts had a non-unique definition, and 106 concepts had no properties that distinguished them from the subsuming concept. Hence, 31% (450 out of 1456) of the reasons for admission are at risk of reducing the usefulness of a TS. 6.2. Duplicate definition As mentioned in Section 1, cases will be missed when patients are retrieved from a database using a concept that is duplicately defined. In DICE, four pairs (such as “hemopleura” and “hemothorax”) were found that would cause this to happen. This illustrates that duplicate definition does occur in practice, and may remain undetected when concepts have primitive definitions. 6.3. Omission of properties As described above, for most of the concepts that are non-uniquely or primitively defined, properties are omitted (e.g. “meningococcal meningitis”). This will severely hamper reasoning and property-based querying in practice as obviously concepts cannot be retrieved by means of properties that are not specified for that concept. 6.4. Primitive definition In Section 2.3 it was explained that inference of subsumption is blocked when concepts are represented as primitive. The example in Fig. 1, Fig. 2 demonstrated how this may lead to concepts that are not properly classified. Moreover, when a TS is used to construct concepts (i.e. post-coordination), these constructed concepts may also not be classified properly (i.e. some classification may be omitted). In the example above, if a user would register a patient with an inflammation that is located in the membranes of the brain and is caused by meningococci, it should be classified as a meningococcal meningitis, but this will not occur if meningococcal meningitis is defined as primitive. The examples above show the importance of unique and complete concepts definitions, reducing the use of primitive definitions as much as possible. 6.5. Small case study on FMA An interesting question is to what extent the findings in DICE are system specific. To this end we carried out a second case study on the FMA.5 FMA, developed by the University of Washington, provides about 69,000 concept definitions, describing anatomical structures, shapes, and other entities, such as coordinates (left, right, etc.). The FMA knowledge base, which is implemented as a frame-based model in Protégé,6 has been migrated to DL, where specified slot-fillers in the frame-based representation were interpreted as existentially quantified roles (i.e. using the constructor). The resulting TS was represented using KRSS syntax. After applying the first three steps of the method, the DL-based representation of all of FMA contained about 50% primitive and 50% non-primitive concept definitions. Due to its large size we were not able to classify the full TS with RACER. We hence limited the case study to “Organs”, which comprises a convenient subset that is representative for the FMA. Of the 3826 concept definitions, 2659 (69%) were defined as non-primitive, and 1167 (31%) as primitive. Classification with RACER resulted in 2165 concepts (81% of the 2659 non-primitive concept definitions) with a unique definition, and 494 concept definitions (19%) that were equivalent to other definitions. The distribution of numbers of sets for various sizes is shown in Table 2 and was comparable to the distribution found in DICE, as shown in Table 1. Twenty-eight sets contained concepts that referred to laterality (e.g. left phrenic nerve and right phrenic nerve), without explicit reference to laterality in the definition. In general, for many of the equivalent concepts, the related terms denoted positional information, e.g. distal/middle/proximal or posterior/anterior, but this was not represented in the definition. Hence, making concept definitions more complete using readily available concepts and roles, seemed possible in FMA. For example, “Synovial tendon sheath of flexor hallucis longus” and “Synovial tendon sheath of tibialis anterior”, can be distinguished from each other by explicitly relating them to “flexor hallucis longus”, and “tibialis anterior”, respectively. This case study demonstrates that opportunities for further improvement can also be found in FMA, and probably in other large frame- or DL-based systems. 6.6. Conclusions We have applied the inferential powers of DL reasoners to detect concepts that are equivalently defined within a knowledge base. To find such concepts, we have considered definitions to be non-primitive. A description logic reasoner for classification of the resulting knowledge base generates sets of equivalently defined concepts. In the literature it is hypothesized that underspecification is a common phenomenon in medical terminological systems. Our two case studies confirm this hypothesis. The vast majority of concept definitions that turn out equivalent can be improved by adding necessary conditions to the definition. Further analysis is needed to determine whether this leads to sufficient conditions as well. For some equivalent concepts there seemed to be no possibilities of improving the definition, this is because these concepts represented natural kinds, or could not be expressed due to limitations of the underlying representation. Overall, it can be concluded that the application of the method described in this paper contributes to pointing out which concepts suffer from underspecification or duplicate definition. It needs to be determined whether this method can lead to a significant decrease in the number of primitive concepts in knowledge bases, thus increasing the powers of knowledge-based inference. Although the method has been applied only to two knowledge bases in the field of medicine, it is likely that it is applicable to other domains as well. Summary points What was already known? •Description logic can be used to determine equivalence. •Equivalence depends on non-primitive definitions, which provide necessary and sufficient conditions. •In medicine, many concepts lack definitions with necessary and sufficient conditions. What has the study added? •Assuming definitions to be non-primitive reveals equivalent concepts. •Equivalent concepts reveal either duplicate definitions or underspecification. •Underspecification can often be reduced by providing a more complete definition. Acknowledgements  This work has been partially funded by the Netherlands Organization for Scientific Research (NWO) program Information & Communication Technology in Healthcare (ICZ) for the project entitled Terminology and Semantics: Making semantics explicit, number 014-18-014. A preliminary version of this work has been presented at the 2004 International Workshop on Description Logics-DL2004, in Whistler, Canada [31]. Comments on the manuscript provided by Prof. Arie Hasman as well as by two anonymous reviewers and the editor are highly appreciated. References  [1]. [1]de Keizer NF, Abu-Hanna A, Zwetsloot-Schonk JH. Understanding terminological systems. I. Terminology and typology. Methods Inform. Med. 2000;39(1):16–21. [2]. [2]Rossi Mori A, Consorti F, Galeazzi E. Standards to support development of terminological systems for healthcare telematics. Methods Inform. Med. 1998;37(4–5):551–563. [3]. [3]Cornelius Rosse JLV, Mejino . A reference ontology for biomedical informatics: the foundational model of anatomy. J. Biomed. Inform. 2003;36(6):478–500. MEDLINE |
CrossRef
[4]. [4]Harris MA, Clark J, Ireland A. The gene ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32(Database issue):D258–D261. [5]. [5]Spackman KA. SNOMED CT milestones: endorsements are added to already-impressive standards credentials. Healthcare Inform. 2004;21(9):. [6]. [6]Brown SH, Elkin PL, Rosenbloom ST. VA national drug file reference terminology: a cross-institutional content coverage study. In: Fieschi M, Coiera E, Li J editor. Proceedings from Medinfo 2004, vol. 11. San Francisco, CA, USA. Amsterdam, The Netherlands: IOS Press; 2004;p. 477–481. [7]. [7]Hartel FW, de Coronado S, Dionne R, Fragoso G, Golbeck J. Modeling a description logic vocabulary for cancer research. J. Biomed. Inform. 2005;38(2):114–129. MEDLINE |
CrossRef
[8]. [8]Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inform. Med. 1998;37(4–5):394–403. [9]. [9]Horridge M, Rector AL, Drummond N. OWL pizzas: common errors and common patterns from practical experience of teaching OWL-DL. In: Motta E, Shadbolt N, Stutt A, Gibbins N editor. European Knowledge Acquisition Workshop. Northamptonshire, UK: Springer Verlag; 2004;p. 63–81. [10]. [10]Cornet R, Abu-Hanna A. Description logic-based methods for auditing frame-based medical terminological systems. Artif. Intell. Med. 2005;34(3):201–217. Abstract | Full Text |
Full-Text PDF (317 KB)
|
CrossRef
[11]. [11]Guarino N, Welty CA. Evaluating ontological decisions with OntoClean. Commun. ACM. 2002;45(2):61–65. [12]. [12]Devanbu PT, Jones MA. The use of description logics in KBSE systems: experience report. In: Fadini B, Fadini B, Osterweil L, van Lamsweerde A editor. Proceedings of the 16th International Conference on Software Engineering. Sorrento, Italy. Los Alamitos, CA, USA: IEEE Computer Society Press; 1994;p. 23–35. [13]. [13]Cimino JJ. Auditing the unified medical language system with semantic methods. J. Am. Med. Inform. Assoc. 1998;5(1):41–51. MEDLINE [14]. [14]Hole WT, Srinivasan S. Discovering missed synonymy in a large concept-oriented metathesaurus. In: Marc Overhage J editors. Proceedings of the 2000 AMIA Annual Symposium. Los Angeles, CA, USA. Philadelphia, PA, USA: Hanley and Belfus Inc.; 2000;p. 354–358. [15]. [15]Gu H, Perl Y, Elhanan G. Auditing concept categorizations in the UMLS. Artif. Intell. Med. 2004;31(1):29–44. Abstract | Full Text |
Full-Text PDF (195 KB)
|
CrossRef
[16]. [16]Geller J, Gu H, Perl Y, Halper M. Semantic refinement and error correction in large terminological knowledge bases. Data Knowledge Eng. 2003;45(1):1–32. [17]. [17]Yeh I, Karp PD, Noy NF, Altman RB. Knowledge acquisition, consistency checking and concurrency control for gene ontology (GO). Bioinformatics. 2003;19(2):241–248. [18]. [18]Abu-Hanna A, Cornet R, de Keizer N, Crubezy M, Tu S. Protégé as a vehicle for developing medical terminological systems. Int. J. Hum. Comput. Stud. 2005;62(5):639–663. [19]. [19]Ceusters W, Smith B, Kumar A, Dhaen C. Mistakes in medical ontologies: where do they come from and how can they be detected?. Stud. Health Technol. Inform. 2004;102:145–163. MEDLINE [20]. [20]Ceusters W, Smith B, Kumar A, Dhaen C. Ontology-based error detection in SNOMED-CT®. In: Fieschi M, Coiera E, Li J editor. Proceedings from Medinfo 2004, vol. 11. San Francisco, CA, USA. Amsterdam, The Netherlands: IOS Press; 2004;p. 482–486. [21]. [21]Van Buggenhout C, Ceusters W. A novel view on information content of concepts in a large ontology and a view on the structure and the quality of the ontology. Int. J. Med. Inform. 2005;74(2–4):125–132. Abstract | Full Text |
Full-Text PDF (188 KB)
|
CrossRef
[22]. [22]Simon J, Dos Santos M, Fielding J, Smith B. Formal ontology for natural language processing and the integration of biomedical databases. Int. J. Med. Inform. 2006;75(3–4):224–231. Abstract | Full Text |
Full-Text PDF (105 KB)
|
CrossRef
[23]. [23]Baader F, Calvanese D, McGuinness DL, Nardi D, Patel-Schneider PF. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge: University Press; 2003;. [24]. [24]A.L. Rector, Coordinating taxonomies: key to reusable concept representations, in: P. Barahona, M. Stefanelli, J. Wyatt (Eds.), Proceedings of the Fifth Conference on Artificial Intelligence in Medicine in Europe, AIME, Pavia, Italy, Lecture Notes in Artificial Intelligence, Springer-Verlag, 1995, pp. 17–28. [25]. [25]Doyle J, Patil R. Two theses of knowledge representation: Language restrictions, taxonomic classifications, and the utility of representation services. Artif. Intell. 1991;48(3):261–298. [26]. [26]Rector AL. Modularisation of domain ontologies implemented in description logics and related formalisms including OWL. In: Gennari J, Porter B editor. International Conference on Knowledge Capture. Sanibel, FL, USA. New York, USA: ACM Press; 2003;p. 121–128. [27]. [27]Gu H, Perl Y, Geller J, Halper M, Singh M. A methodology for partitioning a vocabulary hierarchy into trees. Artif. Intell. Med. 1999;15(1):77–98. Abstract | Full Text |
Full-Text PDF (399 KB)
|
CrossRef
[28]. [28]Bodenreider O, Smith B, Kumar A, Burgun A. Investigating subsumption in DL-based terminologies: a case study in SNOMED CT. In: Hahn U editors. KR 2004 Workshop on Formal Biomedical Knowledge Representation (KR-MED 2004). Whistler, BC, Canada. AMIA; 2004;p. 12–20. [29]. [29]de Keizer NF, Abu-Hanna A, Cornet R, Zwetsloot-Schonk JH, Stoutenbeek CP. Analysis and design of an ontology for intensive care diagnoses. Methods Inform. Med. 1999;38(2):102–112. [30]. [30]P.F. Patel-Schneider, B. Swartout. Description-logic knowledge representation system specification from the KRSS group of the ARPA knowledge sharing effort. Technical Report, KRSS Group of the ARPA Knowledge Sharing Effort, November 1, 1993. [31]. [31]Cornet R, Abu-Hanna A. Using non-primitive concept definitions for improving DL-based knowledge bases. In: Haarslev V, Möller R editor. Proceedings of the 2004 International Workshop on Description Logics, DL2004, vol. 104. Whistler, Canada. Aachen, Germany: CEUR-WS; 2004;p. 138–147. [32]. [32]N. F. Noy, W. E. Grosso, and M. A. Musen. Knowledge-acquisition interfaces for domain experts: an empirical evaluation of protégé-2000, in: S.-K. Chang (Ed.), Proceedings of the 12th International Conference on Software Engineering and Knowledge Engineering (SEKE2000), SMI Technical Report SMI-2000-0825, Chicago, IL, 2000, pp. 177–186. Academic Medical Center, Universiteit van Amsterdam, Department of Medical Informatics, PO Box 22700, 1100 DE Amsterdam, The Netherlands Corresponding author. Tel.: +31 20 5665188; fax: +31 20 6919840.
PII: S1386-5056(07)00123-2 doi:10.1016/j.ijmedinf.2007.06.008 © 2007 Elsevier Ireland Ltd. All rights reserved. | |
|