Data Mining. Concepts and Techniques, 3rd Edition

HAN 09-ch02-039-082-9780123814791

Yüklə 7,95 Mb.

Pdf görüntüsü

səhifə	38/343
tarix	08.10.2017
ölçüsü	7,95 Mb.
	#3817

1 ... 34 35 36 37 38 39 40 41 ... 343

Figure 2.7
Data Visualization
Example 2.16 Pixel-oriented visualization.

HAN

09-ch02-039-082-9780123814791

2011/6/1

3:15

Page 55

#17

2.2 Basic Statistical Descriptions of Data

6000

5000

4000

3000

2000

1000

Count of items sold

40–59

60–79

80–99

100–119

120–139

Unit price ($)

Figure 2.6

A histogram for the Table 2.1 data set.

Unit price ($)

Items sold

700

600

500

400

300

200

100

120

140

Figure 2.7

A scatter plot for the Table 2.1 data set.

(a)

(b)

Figure 2.8

Scatter plots can be used to ﬁnd (a) positive or (b) negative correlations between attributes.

HAN

09-ch02-039-082-9780123814791

2011/6/1

3:15

Page 56

#18

56

Chapter 2 Getting to Know Your Data

Figure 2.9

Three cases where there is no observed correlation between the two plotted attributes in each

of the data sets.

from lower left to upper right, this means that the values of X increase as the values

of Y increase, suggesting a positive correlation (Figure 2.8a). If the pattern of plotted

points slopes from upper left to lower right, the values of X increase as the values of Y

decrease, suggesting a negative correlation (Figure 2.8b). A line of best ﬁt can be drawn

to study the correlation between the variables. Statistical tests for correlation are given

in Chapter 3 on data integration (Eq. (3.3)). Figure 2.9 shows three cases for which

there is no correlation relationship between the two attributes in each of the given data

sets. Section 2.3.2 shows how scatter plots can be extended to n attributes, resulting in a

scatter-plot matrix.

In conclusion, basic data descriptions (e.g., measures of central tendency and mea-

sures of dispersion) and graphic statistical displays (e.g., quantile plots, histograms, and

scatter plots) provide valuable insight into the overall behavior of your data. By helping

to identify noise and outliers, they are especially useful for data cleaning.

2.3

Data Visualization

How can we convey data to users effectively? Data visualization aims to communicate

data clearly and effectively through graphical representation. Data visualization has been

used extensively in many applications—for example, at work for reporting, managing

business operations, and tracking progress of tasks. More popularly, we can take advan-

tage of visualization techniques to discover data relationships that are otherwise not

easily observable by looking at the raw data. Nowadays, people also use data visualization

to create fun and interesting graphics.

In this section, we brieﬂy introduce the basic concepts of data visualization. We start

with multidimensional data such as those stored in relational databases. We discuss

several representative approaches, including pixel-oriented techniques, geometric pro-

jection techniques, icon-based techniques, and hierarchical and graph-based techniques.

We then discuss the visualization of complex data and relations.

HAN

09-ch02-039-082-9780123814791

2011/6/1

3:15

Page 57

#19

2.3 Data Visualization

2.3.1

Pixel-Oriented Visualization Techniques

A simple way to visualize the value of a dimension is to use a pixel where the color of

the pixel reﬂects the dimension’s value. For a data set of m dimensions, pixel-oriented

techniques create m windows on the screen, one for each dimension. The m dimension

values of a record are mapped to m pixels at the corresponding positions in the windows.

The colors of the pixels reﬂect the corresponding values.

Inside a window, the data values are arranged in some global order shared by all

windows. The global order may be obtained by sorting all data records in a way that’s

meaningful for the task at hand.

Example 2.16

Pixel-oriented visualization. AllElectronics maintains a customer information table,

which consists of four dimensions: income, credit limit, transaction volume, and age. Can

we analyze the correlation between income and the other attributes by visualization?

We can sort all customers in income-ascending order, and use this order to lay out

the customer data in the four visualization windows, as shown in Figure 2.10. The pixel

colors are chosen so that the smaller the value, the lighter the shading. Using pixel-

based visualization, we can easily observe the following: credit limit increases as income

increases; customers whose income is in the middle range are more likely to purchase

more from AllElectronics; there is no clear correlation between income and age.

In pixel-oriented techniques, data records can also be ordered in a query-dependent

way. For example, given a point query, we can sort all records in descending order of

similarity to the point query.

Filling a window by laying out the data records in a linear way may not work well for

a wide window. The ﬁrst pixel in a row is far away from the last pixel in the previous row,

though they are next to each other in the global order. Moreover, a pixel is next to the

one above it in the window, even though the two are not next to each other in the global

order. To solve this problem, we can lay out the data records in a space-ﬁlling curve

(a) income

(b) credit_limit

(c) transaction_volume

(d) age

Figure 2.10

Pixel-oriented visualization of four attributes by sorting all customers in income ascending

order.

Yüklə 7,95 Mb.

Dostları ilə paylaş:

1 ... 34 35 36 37 38 39 40 41 ... 343