Data Mining and Knowledge Discovery in Databases (kdd) State of the Art Prof. Dr. T. Nouri

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art

Conference overview

Overview of data mining

What is data mining?

Why data mining?

Data mining goals

Data mining operations

Data mining operations

Data mining process

Data mining process

Data mining process

Related fields

Need for data mining tools

Conference overview

Data mining methods

Data mining techniques

Data mining techniques

Data mining techniques

Data mining techniques

Research challenges for KDD

Types of data mining tasks

Components of DM methods

Data mining techniques

What is association mining?

Support & Confidence

Association Mining ex.

What is association mining?

What is sequence mining?

Sequence mining

Predictive modeling

Models

What is Classification?

Classification learning

Decision-tree classification

From tree to rules

What is clustering?

Clustering

Clustering schemes

K-means algorithm

Deviation detection

K-nearest neighbors

K-nearest neighbors

Conference overview

Conference overview

Conclusions

Conclusions

KDD resources pointers

Dostları ilə paylaş:

Data Mining and Knowledge Discovery in Databases (kdd) State of the Art Prof. Dr. T. Nouri

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art

Prof. Dr. T. Nouri

Computer Science Department

FHNW Switzerland

Conference overview

Overview of KDD and data mining

Data mining techniques

Demo

Summary

Overview of data mining

What is KDD?

Why is KDD necessary

The KDD process

KDD operations and methods

The iterative and interactive process of discovering valid, novel, useful, and understandable knowledge ( patterns, models, rules etc.) in Massive databases

What is data mining?

Valid: generalize to the future

Novel: what we don't know

Useful: be able to take some action

Understandable: leading to insight

Iterative: takes multiple passes

Interactive: human in the loop

Why data mining?

Data volume too large for classical analysis

Increased opportunity for access

Data mining goals

Prediction

Description

Data mining operations

Verification driven

Data mining operations

Discovery driven

Data mining process

Data mining process

Understand application domain

Create target dataset

Data cleaning and transformation

Data mining process

Apply data mining algorithm

Interpret, evaluate and visualize patterns

Manage discovered knowledge

Data mining process

Related fields

AI

Machine learning

Statistics

Databases and data warehousing

High performance computing

Visualization

Need for data mining tools

Human analysis breaks down with volume and dimensionality

What is done by non-statisticians?

Conference overview

Overview of KDD and data mining

Data mining techniques

Demo

Summary

KDD resources pointers

Data mining methods

Predictive modeling (classification, regression)

Segmentation (clustering)

Dependency modeling (graphical models, density estimation)

Summarization (associations)

Change and deviation detection

Data mining techniques

Association rules: detect sets of attributes that frequently co-occur, and rules among them, e.g. 90% of the people who buy cookies, also buy milk (60% of all grocery shoppers buy both)

Sequence mining (categorical): discover sequences of events that commonly occur together, .e.g. In a set of DNA sequences ACGTC is followed by GTCA after a gap of 9, with 30% probability

Data mining techniques

CBR or Similarity search: given a database of objects, and a “query” object, find the object(s) that are within a user-defined distance of the queried object, or find all pairs within some distance of each other.

Deviation detection: find the record(s) that is (are) the most different from the other records, i.e., find all outliers. These may be thrown away as noise or may be the “interesting” ones.

Data mining techniques

Classification and regression: assign a new data record to one of several predefined categories or classes. Regression deals with predicting real-valued fields. Also called supervised learning.

Clustering: partition the dataset into subsets or groups such that elements of a group share a common set of properties, with high within group similarity and small inter-group similarity. Also called unsupervised learning.

Data mining techniques

Many other methods, such as

Research challenges for KDD

Scalability

Automation

Types of data mining tasks