HAN
10-ch03-083-124-9780123814791
2011/6/1
3:16
Page 101
#19
3.4 Data Reduction
101
cleaning as well. Given a set of coefficients, an approximation of the original data can be
constructed by applying the inverse of the DWT used.
The DWT is closely related to the discrete Fourier transform (DFT), a signal process-
ing technique involving sines and cosines. In general, however, the DWT achieves better
lossy compression. That is, if the same number of coefficients is retained for a DWT and
a DFT of a given data vector, the DWT version will provide a more accurate approxima-
tion of the original data. Hence, for an equivalent approximation, the DWT requires less
space than the DFT. Unlike the DFT, wavelets are quite localized in space, contributing
to the conservation of local detail.
There is only one DFT, yet there are several families of DWTs. Figure 3.4 shows
some wavelet families. Popular wavelet transforms include the Haar-2, Daubechies-4,
and Daubechies-6. The general procedure for applying a discrete wavelet transform uses
a hierarchical pyramid algorithm that halves the data at each iteration, resulting in fast
computational speed. The method is as follows:
1.
The length, L, of the input data vector must be an integer power of 2. This condition
can be met by padding the data vector with zeros as necessary (L ≥ n).
2.
Each transform involves applying two functions. The first applies some data smooth-
ing, such as a sum or weighted average. The second performs a weighted difference,
which acts to bring out the detailed features of the data.
3.
The two functions are applied to pairs of data points in X, that is, to all pairs of
measurements
(x
2i
, x
2i+1
). This results in two data sets of length L/2. In general,
these represent a smoothed or low-frequency version of the input data and the high-
frequency content of it, respectively.
4.
The two functions are recursively applied to the data sets obtained in the previous
loop, until the resulting data sets obtained are of length 2.
5.
Selected values from the data sets obtained in the previous iterations are designated
the wavelet coefficients of the transformed data.
0
2
4
6
0.8
0.6
0.4
0.2
0.0
Ϫ1.0 Ϫ0.5 0.0 0.5
(a) Haar-2
(b) Daubechies-4
1.0 1.5 2.0
0.6
0.4
0.2
0.0
Figure 3.4
Examples of wavelet families. The number next to a wavelet name is the number of vanishing
moments of the wavelet. This is a set of mathematical relationships that the coefficients must
satisfy and is related to the number of coefficients.
HAN
10-ch03-083-124-9780123814791
2011/6/1
3:16
Page 102
#20
102
Chapter 3 Data Preprocessing
Equivalently, a matrix multiplication can be applied to the input data in order to
obtain the wavelet coefficients, where the matrix used depends on the given DWT. The
matrix must be orthonormal, meaning that the columns are unit vectors and are mutu-
ally orthogonal, so that the matrix inverse is just its transpose. Although we do not have
room to discuss it here, this property allows the reconstruction of the data from the
smooth and smooth-difference data sets. By factoring the matrix used into a product of
a few sparse matrices, the resulting “fast DWT” algorithm has a complexity of O
(n) for
an input vector of length n.
Wavelet transforms can be applied to multidimensional data such as a data cube. This
is done by first applying the transform to the first dimension, then to the second, and so
on. The computational complexity involved is linear with respect to the number of cells
in the cube. Wavelet transforms give good results on sparse or skewed data and on data
with ordered attributes. Lossy compression by wavelets is reportedly better than JPEG
compression, the current commercial standard. Wavelet transforms have many real-
world applications, including the compression of fingerprint images, computer vision,
analysis of time-series data, and data cleaning.
3.4.3
Principal Components Analysis
In this subsection we provide an intuitive introduction to principal components analy-
sis as a method of dimesionality reduction. A detailed theoretical explanation is beyond
the scope of this book. For additional references, please see the bibliographic notes
(Section 3.8) at the end of this chapter.
Suppose that the data to be reduced consist of tuples or data vectors described
by n attributes or dimensions. Principal components analysis (PCA; also called the
Karhunen-Loeve, or K-L, method) searches for k n-dimensional orthogonal vectors that
can best be used to represent the data, where k ≤ n. The original data are thus projected
onto a much smaller space, resulting in dimensionality reduction. Unlike attribute sub-
set selection (Section 3.4.4), which reduces the attribute set size by retaining a subset of
the initial set of attributes, PCA “combines” the essence of attributes by creating an alter-
native, smaller set of variables. The initial data can then be projected onto this smaller
set. PCA often reveals relationships that were not previously suspected and thereby
allows interpretations that would not ordinarily result.
The basic procedure is as follows:
1.
The input data are normalized, so that each attribute falls within the same range. This
step helps ensure that attributes with large domains will not dominate attributes with
smaller domains.
2.
PCA computes k orthonormal vectors that provide a basis for the normalized input
data. These are unit vectors that each point in a direction perpendicular to the others.
These vectors are referred to as the principal components. The input data are a linear
combination of the principal components.
3.
The principal components are sorted in order of decreasing “significance” or
strength. The principal components essentially serve as a new set of axes for the data,