HAN
09-ch02-039-082-9780123814791
2011/6/1
3:15
Page 61
#23
2.3 Data Visualization
61
10
Sepal length (mm)
Petal length (mm)
Sepal width (mm)
Petal width (mm)
30
50
70
0
10
20
80
70
60
50
40
45
40
35
30
25
20
40 50 60 70 80
20
Iris Species
Setosa
Versicolor
Virginica
30
40
70
50
30
10
25
20
15
10
5
0
Figure 2.15
Visualization of the Iris data set using a scatter-plot matrix. Source: http://support.sas.com/
documentation/cdl/en/grstatproc/61948/HTML/default/images/gsgscmat.gif .
Viewing large tables of data can be tedious. By condensing the data, Chernoff faces
make the data easier for users to digest. In this way, they facilitate visualization of reg-
ularities and irregularities present in the data, although their power in relating multiple
relationships is limited. Another limitation is that specific data values are not shown.
Furthermore, facial features vary in perceived importance. This means that the similarity
of two faces (representing two multidimensional data points) can vary depending on the
order in which dimensions are assigned to facial characteristics. Therefore, this mapping
should be carefully chosen. Eye size and eyebrow slant have been found to be important.
Asymmetrical Chernoff faces were proposed as an extension to the original technique.
Since a face has vertical symmetry (along the y-axis), the left and right side of a face are
identical, which wastes space. Asymmetrical Chernoff faces double the number of facial
characteristics, thus allowing up to 36 dimensions to be displayed.
The stick figure visualization technique maps multidimensional data to five-piece
stick figures, where each figure has four limbs and a body. Two dimensions are mapped
to the display (x and y) axes and the remaining dimensions are mapped to the angle
HAN
09-ch02-039-082-9780123814791
2011/6/1
3:15
Page 62
#24
62
Chapter 2 Getting to Know Your Data
10
y
5
0
ϫ1
ϫ2
ϫ3
ϫ4
ϫ5
ϫ6
ϫ7
ϫ8
ϫ9 ϫ10
–5
–10
x
Figure 2.16
Here is a visualization that uses parallel coordinates. Source: www.stat.columbia.edu/∼cook/
movabletype/archives/2007/10/parallel coordi.thml.
Figure 2.17
Chernoff faces. Each face represents an n-dimensional data point (n ≤ 18).
and/or length of the limbs. Figure 2.18 shows census data, where age and income are
mapped to the display axes, and the remaining dimensions (gender, education, and so
on) are mapped to stick figures. If the data items are relatively dense with respect to
the two display dimensions, the resulting visualization shows texture patterns, reflecting
data trends.
HAN
09-ch02-039-082-9780123814791
2011/6/1
3:15
Page 63
#25
2.3 Data Visualization
63
income
age
Figure 2.18
Census data represented using stick figures. Source: Professor G. Grinstein, Department of
Computer Science, University of Massachusetts at Lowell.
2.3.4
Hierarchical Visualization Techniques
The visualization techniques discussed so far focus on visualizing multiple dimensions
simultaneously. However, for a large data set of high dimensionality, it would be diffi-
cult to visualize all dimensions at the same time. Hierarchical visualization techniques
partition all dimensions into subsets (i.e., subspaces). The subspaces are visualized in a
hierarchical manner.
“Worlds-within-Worlds,” also known as
n-Vision, is a representative hierarchical
visualization method. Suppose we want to visualize a 6-D data set, where the dimensions
are F, X
1
,
...,
X
5
. We want to observe how dimension F changes with respect to the other
dimensions. We can first fix the values of dimensions
X
3
, X
4
, X
5
to some selected values,
say,
c
3
, c
4
, c
5
. We can then visualize F, X
1
, X
2
using a 3-D plot, called a world, as shown in
Figure 2.19. The position of the origin of the inner world is located at the point
(c
3
, c
4
, c
5
)
in the outer world, which is another 3-D plot using dimensions X
3
, X
4
, X
5
. A user can
interactively change, in the outer world, the location of the origin of the inner world.
The user then views the resulting changes of the inner world. Moreover, a user can vary
the dimensions used in the inner world and the outer world. Given more dimensions,
more levels of worlds can be used, which is why the method is called “worlds-within-
worlds.”
As another example of hierarchical visualization methods, tree-maps display hier-
archical data as a set of nested rectangles. For example, Figure 2.20 shows a tree-map
visualizing Google news stories. All news stories are organized into seven categories, each
shown in a large rectangle of a unique color. Within each category (i.e., each rectangle
at the top level), the news stories are further partitioned into smaller subcategories.