HAN
09-ch02-039-082-9780123814791
2011/6/1
3:15
Page 41
#3
2.1 Data Objects and Attribute Types
41
The type of an attribute is determined by the set of possible values—nominal, binary,
ordinal, or numeric—the attribute can have. In the following subsections, we introduce
each type.
2.1.2
Nominal Attributes
Nominal means “relating to names.” The values of a nominal attribute are symbols or
names of things. Each value represents some kind of category, code, or state, and so nomi-
nal attributes are also referred to as categorical. The values do not have any meaningful
order. In computer science, the values are also known as enumerations.
Example 2.1
Nominal attributes. Suppose that hair
color and marital status are two attributes
describing person objects. In our application, possible values for hair color are black,
brown,
blond,
red,
auburn,
gray, and
white. The attribute
marital status can take on
the values single, married, divorced, and widowed. Both hair color and marital status
are nominal attributes. Another example of a nominal attribute is occupation, with the
values teacher, dentist, programmer, farmer, and so on.
Although we said that the values of a nominal attribute are symbols or “names
of things,” it is possible to represent such symbols or “names” with numbers. With
hair color, for instance, we can assign a code of 0 for
black, 1 for
brown, and so on.
Another example is customor ID, with possible values that are all numeric. However,
in such cases, the numbers are not intended to be used quantitatively. That is, mathe-
matical operations on values of nominal attributes are not meaningful. It makes no
sense to subtract one customer ID number from another, unlike, say, subtracting an age
value from another (where age is a numeric attribute). Even though a nominal attribute
may have integers as values, it is not considered a numeric attribute because the inte-
gers are not meant to be used quantitatively. We will say more on numeric attributes in
Section 2.1.5.
Because nominal attribute values do not have any meaningful order about them and
are not quantitative, it makes no sense to find the mean (average) value or median
(middle) value for such an attribute, given a set of objects. One thing that is of inter-
est, however, is the attribute’s most commonly occurring value. This value, known as
the mode, is one of the measures of central tendency. You will learn about measures of
central tendency in Section 2.2.
2.1.3
Binary Attributes
A binary attribute is a nominal attribute with only two categories or states: 0 or 1, where
0 typically means that the attribute is absent, and 1 means that it is present. Binary
attributes are referred to as Boolean if the two states correspond to true and false.
Example 2.2
Binary attributes. Given the attribute
smoker describing a
patient object, 1 indicates
that the patient smokes, while 0 indicates that the patient does not. Similarly, suppose
HAN
09-ch02-039-082-9780123814791
2011/6/1
3:15
Page 42
#4
42
Chapter 2 Getting to Know Your Data
the patient undergoes a medical test that has two possible outcomes. The attribute
medical test is binary, where a value of 1 means the result of the test for the patient
is positive, while 0 means the result is negative.
A binary attribute is symmetric if both of its states are equally valuable and carry
the same weight; that is, there is no preference on which outcome should be coded
as 0 or 1. One such example could be the attribute gender having the states male and
female.
A binary attribute is asymmetric if the outcomes of the states are not equally impor-
tant, such as the positive and negative outcomes of a medical test for HIV. By convention,
we code the most important outcome, which is usually the rarest one, by 1 (e.g., HIV
positive) and the other by 0 (e.g.,
HIV negative).
2.1.4
Ordinal Attributes
An ordinal attribute is an attribute with possible values that have a meaningful order or
ranking among them, but the magnitude between successive values is not known.
Example 2.3
Ordinal attributes. Suppose that
drink size corresponds to the size of drinks available at
a fast-food restaurant. This nominal attribute has three possible values: small, medium,
and large. The values have a meaningful sequence (which corresponds to increasing
drink size); however, we cannot tell from the values how much bigger, say, a medium
is than a large. Other examples of ordinal attributes include grade (e.g., A+, A, A−, B+,
and so on) and professional rank. Professional ranks can be enumerated in a sequential
order: for example, assistant, associate, and full for professors, and private, private first
class, specialist, corporal, and sergeant for army ranks.
Ordinal attributes are useful for registering subjective assessments of qualities that
cannot be measured objectively; thus ordinal attributes are often used in surveys for
ratings. In one survey, participants were asked to rate how satisfied they were as cus-
tomers. Customer satisfaction had the following ordinal categories: 0: very dissatisfied,
1: somewhat dissatisfied, 2: neutral, 3: satisfied, and 4: very satisfied.
Ordinal attributes may also be obtained from the discretization of numeric quantities
by splitting the value range into a finite number of ordered categories as described in
Chapter 3 on data reduction.
The central tendency of an ordinal attribute can be represented by its mode and its
median (the middle value in an ordered sequence), but the mean cannot be defined.
Note that nominal, binary, and ordinal attributes are qualitative. That is, they describe
a feature of an object without giving an actual size or quantity. The values of such
qualitative attributes are typically words representing categories. If integers are used,
they represent computer codes for the categories, as opposed to measurable quantities
(e.g., 0 for small drink size, 1 for medium, and 2 for large). In the following subsec-
tion we look at numeric attributes, which provide quantitative measurements of an
object.