Types of correspondences: equivalence, subsumption, others Purpose of ontology alignment



Yüklə 445 b.
tarix17.01.2018
ölçüsü445 b.
#21099



Ontology Alignment

  • Linking two ontologies by detecting semantic correspondences between their representational units

  • Types of correspondences: equivalence, subsumption, others

  • Purpose of ontology alignment:

    • Creating interoperability between semantically annotated data
    • Enriching semantics
    • Cross-Validation of ontologies
  • Requirements of ontology alignment:

    • comparable scope
    • comparable context
    • comparable semantic foundations


Outline



Outline

  • Introduction

    • BioTop
    • UMLS SN
  • Methodology

    • UMLS SN: formal redefinition
    • Interactive Mapping
  • Assessment

    • Ontology Cross-Validation
    • NE co-occurrence validation
    • UMLS SN cluster consistency
  • Conclusion



BioTop – a Life Science Upper Ontology

  • Recent development (starting 2006, Freiburg & Jena)

  • Goal: to provide formal definitions of upper-level types and relations for the biomedical domain

  • Uses description logics (OWL-DL)

    • 339 classes, 60 relation types
    • 373 subclass axioms
    • 80 equivalent class axioms, 66 disjoint class axioms
  • Compatible with BFO and DOLCE lite

  • links to OBO ontologies

  • downloadable from: http://purl.org/biotop





UMLS Semantic Network (SN)

  • Tree of 135 semantic types (e.g. Tissue, Diagnostic_Procedure)

  • 53 associative relationships (e.g., treats, location_of)

  • 612 relational assertions (triples), sanctioning the domain and range of relations {Tissue; location_of; Diagnostic_Procedure}

  • mainly unchanged in the last 20 years



UMLS Semantic Network (SN)



Comparison UMLS-SN - BioTop



Outline

  • Introduction

    • BioTop
    • UMLS SN
  • Methodology

    • UMLS SN: formal redefinition
    • Interactive Mapping
  • Assessment

    • Ontology Cross-Validation
    • NE co-occurrence validation
    • UMLS SN cluster consistency
  • Conclusion



Outline

  • Introduction

    • BioTop
    • UMLS SN
  • Methodology

    • UMLS SN: formal redefinition
    • Interactive Mapping
  • Assessment

    • Ontology Cross-Validation
    • NE co-occurrence validation
    • UMLS SN cluster consistency
  • Conclusion



Methodology

  • Prerequisite: provide description logics semantics to the UMLS SN: umlssn.owl

  • Building a bridging ontology

    • Subsumption
    • Equivalence


Redefinition of UMLS SN semantics



Redefinition of UMLS SN semantics

  • Semantic Types, e.g.: Tissue, Diagnostic_Procedure:

    • Types extend to classes of individuals
    • subsumption hierarchies = is-a hierarchies (every instance of a child is also an instance of each parent)
    • no explicit disjoint partitions
  • Semantic Relations, e.g.: treats, location_of:

    • Reified as classes, not represented as OWL object properties
  • Triples, e.g.: {Tissue; location_of; Diagnostic_Procedure}

    • domain and range restrictions = value restrictions on the roles has-domain and has-range


UMLS SN: Why SRs as classes … and not OWL object properties? (I)



UMLS SN: Why SRs as classes .. and not OWL object properties? (II)

  • Source Representation

  • Target Representation



Representation of SRs and triples

  • All triples including R are defined as subclasses of R Affects_Domain_Cell_Component_Range_Physiologic_Function ⊑ Affects ⊓  has_domain. Cell_Component ⊓  has_range. Physiologic_Function

  • All parents are fully defined by the union of their children Brings_About ≡ Produces ⊔ Causes



Mapping



Mapping

  • Fully manually, using Protégé 4, consistency check with Fact++ and Pellet 1.5, supported by explanation plugin*

  • Analyzing

    • UMLS SN hierarchies and free-text definitions
    • BioTop formal and free-text definitions
  • Iterative check of

    • logic consistency (DL classifier)
    • domain adequacy (analysis of new entailments)


Mapping workflow



Mapping of UMLS Types

  • Direct Match (often after content addition to BioTop): sn:Plant ≡ bt:Plant

  • Restriction mapping: sn:AnatomicalAbnormality ≡ bt:OrganismPart ⊓  bt:bearerOf.bt:PathologicalCondition

  • Union: sn:Gene_Or_Genome ≡ bt:Gene ⊔ bt:Genome.

  • Out of scope sn:Daily_Or_Recreational_Activity ⊑ bt:Action ⊓  bt:hasParticipant.bt:Human

  • No mapping sn:Idea_or_concept



Mapping of UMLS Relations

  • Mapping of domain and range sn:hasDomain ≡ bt:hasAgent sn:hasRange ≡ bt:hasPatient

  • Mapping of (reified) SN relations sn:Affects≡ bt:Affecting

  • Linkage of (reified) SN relations to BioTop relations by augmented restrictions: sn:hasDomain  (bt:physicalPartOf  (ImmaterialPhysicalEntity ⊔ MaterialEntity)) ⊓ sn:hasRange  (bt:hasPhysicalPart  (ImmaterialPhysicalEntity ⊔ MaterialEntity))



Outline

  • Introduction

    • BioTop
    • UMLS SN
  • Methodology

    • UMLS SN: formal redefinition
    • Interactive Mapping
  • Assessment

    • Ontology Cross-Validation
    • NE co-occurrence validation
    • UMLS SN cluster consistency
  • Conclusion



Outline

  • Introduction

    • BioTop
    • UMLS SN
  • Methodology

    • UMLS SN: formal redefinition
    • Interactive Mapping
  • Assessment

    • Ontology Cross-Validation
    • NE co-occurrence validation
    • UMLS SN cluster consistency
  • Conclusion



Assessment: Cross-evaluation

  • Formative evaluation of BioTop: Mapping and subsequent classification unveils hidden problems in BioTop:

    • Faulty disjointness axioms (e.g. bt:Organic Chemical was disjoint from bt:Carbohydrate)
    • ambiguities: Sequence as information entity vs. sequence as molecular structure
    • granularity mismatches: e.g. Chromosome as molecule


Assessment: NE co-occurrences

  • Named Entity tagging, UMLS concept pairs identified in 15 M PubMed abstracts

  • Expert rating with sample of co-occurrences: which are semantically related?



Assessment: NE co-occurrences

  • Using SN alone: very low agreement with expert rating

  • Using SN+BioTop: very few rejections (only 3)

  • Reasons:

    • false-positive rate: Expert rating done on NE (e.g. Superoxide reductase unrelated with Aldehyde), but system judgments at type level: sn:Enzyme related to sn:Organic Chemical
    • few rejections: DL’s open world semantics


Assessment: finding incompatible semantic types

  • Each UMLS concept is categorized by one or more UMLS SN types

  • 397 different SN type combinations

  • Using UMLS-SN BioTop Bridge: 133 combinations inconsistent, affecting 6116 UMLS concepts

  • Main reason: hidden ambiguities, e.g. sn:Manufactured Object sn:HealthCareRelatedOrganization (e.g. Hospital as building vs. organization).



Outline

  • Introduction

    • BioTop
    • UMLS SN
  • Methodology

    • UMLS SN: formal redefinition
    • Interactive Mapping
  • Assessment

    • Ontology Cross-Validation
    • NE co-occurrence validation
    • UMLS SN cluster consistency
  • Conclusion



Outline

  • Introduction

    • BioTop
    • UMLS SN
  • Methodology

    • UMLS SN: formal redefinition
    • Interactive Mapping
  • Assessment

    • Ontology Cross-Validation
    • NE co-occurrence validation
    • UMLS SN cluster consistency
  • Conclusion



Conclusion

  • Sucessful alignment between the (legacy) SN and the (novel) BioTop ontology

  • Necessary: formal re-interpretation of SN

  • Prospect: join large amount of data annotated by the SN with formal rigor of BioTop

  • Strength: machine inference, consistency checking

  • Challenge: Antagonize unwarranted effects of the open world semantics by making exhaustive use of disjoint partitions

  • More use cases !



Acknowledgements

  • EC STREP project “BOOTStrep” (FP6 – 028099)

  • Intramural Research Program of the National Institutes of Health (NIH), US National Library of Medicine

  • Martin Boeker (Freiburg)

  • Holger Stenzhorn (Freiburg)

  • Anonymous Reviewers





Ontology Stack





Yüklə 445 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə