Data Mining. Concepts and Techniques, 3rd Edition

HAN 08-ch01-001-038-9780123814791

Yüklə 7,95 Mb.

Pdf görüntüsü

səhifə	24/343
tarix	08.10.2017
ölçüsü	7,95 Mb.
	#3817

1 ... 20 21 22 23 24 25 26 27 ... 343

Which Kinds of Applications Are Targeted
Business Intelligence

HAN

08-ch01-001-038-9780123814791

2011/6/1

3:12

Page 27

#27

1.6 Which Kinds of Applications Are Targeted?

the major topics in a collection of documents and, for each document in the collection,

the major topics involved.

Increasingly large amounts of text and multimedia data have been accumulated and

made available online due to the fast growth of the Web and applications such as dig-

ital libraries, digital governments, and health care information systems. Their effective

search and analysis have raised many challenging issues in data mining. Therefore, text

mining and multimedia data mining, integrated with information retrieval methods,

have become increasingly important.

1.6

Which Kinds of Applications Are Targeted?

Where there are data, there are data mining applications

As a highly application-driven discipline, data mining has seen great successes in many

applications. It is impossible to enumerate all applications where data mining plays a

critical role. Presentations of data mining in knowledge-intensive application domains,

such as bioinformatics and software engineering, require more in-depth treatment and

are beyond the scope of this book. To demonstrate the importance of applications as

a major dimension in data mining research and development, we brieﬂy discuss two

highly successful and popular application examples of data mining: business intelligence

and search engines.

1.6.1

Business Intelligence

It is critical for businesses to acquire a better understanding of the commercial context

of their organization, such as their customers, the market, supply and resources, and

competitors. Business intelligence (BI) technologies provide historical, current, and

predictive views of business operations. Examples include reporting, online analytical

processing, business performance management, competitive intelligence, benchmark-

ing, and predictive analytics.

“How important is business intelligence?” Without data mining, many businesses may

not be able to perform effective market analysis, compare customer feedback on simi-

lar products, discover the strengths and weaknesses of their competitors, retain highly

valuable customers, and make smart business decisions.

Clearly, data mining is the core of business intelligence. Online analytical process-

ing tools in business intelligence rely on data warehousing and multidimensional data

mining. Classiﬁcation and prediction techniques are the core of predictive analytics

in business intelligence, for which there are many applications in analyzing markets,

supplies, and sales. Moreover, clustering plays a central role in customer relationship

management, which groups customers based on their similarities. Using characteriza-

tion mining techniques, we can better understand features of each customer group and

develop customized customer reward programs.

HAN

08-ch01-001-038-9780123814791

2011/6/1

3:12

Page 28

#28

28

Chapter 1 Introduction

1.6.2

Web Search Engines

A Web search engine is a specialized computer server that searches for information

on the Web. The search results of a user query are often returned as a list (sometimes

called hits). The hits may consist of web pages, images, and other types of ﬁles. Some

search engines also search and return data available in public databases or open directo-

ries. Search engines differ from web directories in that web directories are maintained

by human editors whereas search engines operate algorithmically or by a mixture of

algorithmic and human input.

Web search engines are essentially very large data mining applications. Various data

mining techniques are used in all aspects of search engines, ranging from crawling

(e.g., deciding which pages should be crawled and the crawling frequencies), indexing

(e.g., selecting pages to be indexed and deciding to which extent the index should be

constructed), and searching (e.g., deciding how pages should be ranked, which adver-

tisements should be added, and how the search results can be personalized or made

“context aware”).

Search engines pose grand challenges to data mining. First, they have to handle a

huge and ever-growing amount of data. Typically, such data cannot be processed using

one or a few machines. Instead, search engines often need to use computer clouds, which

consist of thousands or even hundreds of thousands of computers that collaboratively

mine the huge amount of data. Scaling up data mining methods over computer clouds

and large distributed data sets is an area for further research.

Second, Web search engines often have to deal with online data. A search engine

may be able to afford constructing a model ofﬂine on huge data sets. To do this, it may

construct a query classiﬁer that assigns a search query to predeﬁned categories based on

the query topic (i.e., whether the search query “apple” is meant to retrieve information

about a fruit or a brand of computers). Whether a model is constructed ofﬂine, the

application of the model online must be fast enough to answer user queries in real time.

Another challenge is maintaining and incrementally updating a model on fast-

growing data streams. For example, a query classiﬁer may need to be incrementally

maintained continuously since new queries keep emerging and predeﬁned categories

and the data distribution may change. Most of the existing model training methods are

ofﬂine and static and thus cannot be used in such a scenario.

Third, Web search engines often have to deal with queries that are asked only a very

small number of times. Suppose a search engine wants to provide context-aware query

recommendations. That is, when a user poses a query, the search engine tries to infer

the context of the query using the user’s proﬁle and his query history in order to return

more customized answers within a small fraction of a second. However, although the

total number of queries asked can be huge, most of the queries may be asked only once

or a few times. Such severely skewed data are challenging for many data mining and

machine learning methods.

A Web crawler is a computer program that browses the Web in a methodical, automated manner.

Yüklə 7,95 Mb.

Dostları ilə paylaş:

1 ... 20 21 22 23 24 25 26 27 ... 343