Towards a Reference Terminology
for Talking about Ontologies and
Related Artifacts
Barry Smith
http://ontology.buffalo.edu/smith
with thanks to
Werner Ceusters, Waclaw Kusnierczyk, Daniel Schober
1
Problem of ensuring sensible
cooperation in a massively
interdisciplinary community
concept
type
instance
model
representation
data
2
What do these mean?
‘conceptual data model’
‘semantic knowledge model’
‘reference information model’
‘an ontology is a specification of a
conceptualization’
3
4
natural language labels
to make the data cognitively
accessible to human beings
and algorithmically tractable
5
compare: legends for maps
6
ontologies are legends for data
7
compare: legends for cartoons
8
legends
help human beings use and understand
complex representations of reality
help human beings create useful complex
representations of reality
help computers process complex
representations of reality
9
computationally tractable legends
help human beings find things in very
large complex representations of reality
10
legends for mathematical equations
xi = vector of measurements of gene i
k = the state of the gene ( as “on” or “off”)
θi = set of parameters of the Gaussian model
...
...
11
Glue-ability / integration
rests on the existence of a common benchmark
called ‘reality’
the ontologies we want to glue together are
representations of what exists in the world
not of what exists in the heads of different
groups of people
12
truth is correspondence to reality
13
simple representations can be true
14
a network diagram can be a
veridical representation of reality
15
16
maps may be correct by reflecting
topology, rather than geometry
17
an image can be a veridical
representation of reality
a labeled image can be a more
useful veridical representation
of reality
18
an image labelled with
computationally tractable labels can
be an even more useful veridical
representation of reality
19
annotations using common ontologies
can yield integration of image data
20
if you’re going to semantically
annotate piles of data, better work
out how to do it right from the start
21
two kinds of annotations
22
names of types
23
names of instances
24
First basic distinction
type vs. instance
(science text vs. diary)
(human being vs. Tom Cruise)
25
For ontologies
it is generalizations that are
important = ontologies are
about types, kinds
26
Ontology
types
Instances
27
Ontology = A Representation of types
28
An ontology is a representation
of types
We learn about types in reality from looking
at the results of scientific experiments in the
form of scientific theories
experiments relate to what is particular
science describes what is general
29
There are created types
bicycle
steering wheel
aspirin
Ford Pinto
we learn about these by looking at manufacturers’
catalogues
30
measurement units are created types
31
Inventory vs. Catalog
Two kinds of representational
artifact
Roughly:
Databases represent instances
Ontologies represent types
32
Catalog vs. inventory
A
B
C
515287
521683
521682
DC3300 Dust Collector Fan
Gilmer Belt
Motor Drive Belt
33
Catalog vs. inventory
34
Catalog of types/Types
35
types
object
organism
animal
mammal
cat
siamese
frog
instances
36
Ontologies are here
37
or here
38
ontologies represent general
structures in reality (leg)
39
Ontologies do not represent
concepts in people’s heads
40
They represent types in reality
41
which provide the benchmark for
integration
42
if you’re going to semantically
annotate piles of data, better work
out how to do it right from the start
43
Entity =def
anything which exists, including things and
processes, functions and qualities, beliefs
and actions, documents and software
(Levels 1, 2 and 3)
44
what are the kinds of entity?
45
First basic distinction
universal vs. instance
(science text vs. diary)
(human being vs. Tom Cruise)
46
Ontology
Universals
Instances
47
Ontology = A Representation of
Universals
48
Each node of an ontology
consists of:
• preferred term (aka term)
• term identifier (TUI, aka CUI)
• synonyms
• definition, glosses, comments
Ontology = A representation of
universals
49
An ontology is a representation
of universals
We learn about universals in reality from
looking at the results of scientific
experiments in the form of scientific theories
experiments relate to what is particular
science describes what is general
50
universals
substance
organism
animal
mammal
cat
siamese
instances
frog
Domain =def
a portion of reality that forms the subjectmatter of a single science or technology or
mode of study or administrative practice ...;
proteomics
HIV
epidemiology
52
Representation =def
an image, idea, map, picture, name or
description ... of some entity or entities.
53
Ontologies are representational
artifacts
comparable to science texts
and subject to the same sorts
of constraints (including
need for update)
54
Representational units =def
terms, icons, alphanumeric identifiers ...
which refer, or are intended to refer, to
entities
and which are minimal (atoms)
55
Composite representation =def
representation
(1) built out of representational units
which
(2) form a structure that mirrors, or is intended
to mirror, the entities in some domain
56
Analogue representations
no representational
units, no ‘atoms’
57
The Periodic Table
Periodic Table
58
Language has the power to create
general terms
which go beyond the domain of universals
studied by science and documented in
catalogs
59
Problem: fiat demarcations
male over 30 years of age with family history
of diabetes
abnormal curvature of spine
participant in trial #2030
60
Problem: roles
fist
patient
FDA-approved drug
61
Administrative ontologies often need
to go beyond universals
Fall on stairs or ladders in water transport injuring
occupant of small boat, unpowered
Railway accident involving collision with rolling
stock and injuring pedal cyclist
Nontraffic accident involving motor-driven snow
vehicle injuring pedestrian
62
Class =def
a maximal collection of particulars
determined by a general term
(‘cell’. ‘electron’ but also: ‘ ‘restaurant in
Palo Alto’, ‘Italian’)
the class A
= the collection of all particulars x for
which ‘x is A’ is true
63
universals vs. their extensions
universals
{a,b,c,...}
collections of particulars
64
Extension =def
The extension of a universal A is the class:
instance of the universal A
(it is the class of A’s instances)
(the class of all entities to which the term ‘A’
applies)
65
Problem
The same general term can be used to refer
both to universals and to collections of
particulars. Consider:
HIV is an infectious retrovirus
HIV is spreading very rapidly through Asia
66
universals vs. classes
universals
{c,d,e,...}
classes
67
universals vs. classes
universals
~ defined classes
68
universals vs. classes
universals
e.g. populations, ...
69
Defined class =def
a class defined by a general term which
does not designate a universal
the class of all diabetic patients in
Leipzig on 4 June 1952
70
OWL is a good representation of
defined classes
• sibling of Finnish spy
• member of Abba aged > 50 years
• pizza with > 4 different toppings
71
Terminology =def.
a representational artifact whose
representational units are natural language
terms (with IDs, synonyms, comments, etc.)
which are intended to designate universals
together with defined classes, with no
particular attention to composite
representations
72
universals, classes, concepts
universals
defined classes
‘concepts’
?
73
universals < defined classes <
‘concepts’
‘concepts’ which do not correspond to
defined classes:
‘Surgical or other procedure not carried out
because of patient's decision’
‘Congenital absent nipple’
because they do not correspond to anything
74
(Scientific) Ontology =def.
a representational artifact whose representational
units (which may be drawn from a natural or from
some formalized language) are intended to
represent
1. universals in reality
2. those relations between these universals which
obtain universally (= for all instances)
lung is_a anatomical structure
lobe of lung part_of lung
75
Rules for Scientific Ontology
How ontology development can be
evidence-based
76
Basis in textbook science
OBO Foundry ontologies are created by
biologist-curators with a thorough
knowledge of the underlying science
Ontology quality is measured in terms of
biological accuracy and usefulness to
working biologists (measured in turn by
numbers of independent users, of
associated software applications, papers
published, ... ).
77
Measure of success for OBO
Foundry initiative
= degree to which it serves the integration
of ever more heterogeneous types of data /
is exploited in the creation of new types of
software or of new types of informaticsbased experimentation
78
Ontology building closely tied to
needs of users with data to annotate
In the GO/Uniprot collaboration, the Foundry
methodology is applied by domain experts who
enjoy joint control of ontology, data and
annotations.
All three get to be curated in tandem.
As results of experiments are described in
annotations, this leads to extensions or
corrections of the ontology, which in turn lead to
better annotations, the whole process being
governed by the querying needs of users in a way
which fosters widespread adoption.
Blake J, et al. Gene Ontology annotations: Proceedings of Bio-Ontologies Workshop, ISMB/ECCB,
Vienna, July 20, 2007
79
Science-based vs. arms-length ontology
This yields superior outcomes when
measured by the results achieved by third
parties who apply the ontologies to tasks
external to those for which they were
created
superior = to those generated on the basis
of arms-length methodologies such as
automatic mining from published literature.
PLoS Biol. 2005 Feb;3(2):e65.
80
81
Download

Smith_3_Terminology