Preparation techniques plan: framework for data preparation techniques in machine learning challenge of data preparation

Yüklə 60,65 Kb.
ölçüsü60,65 Kb.
1   2   3   4   5   6   7   8

Data Preparation Techniques
This section explores the five high-level groups of data preparation techniques defined in the previous section and suggests specific techniques that may fall within each group.
Did I miss one of your preferred or favorite data preparation techniques?
Let me know in the comments below.
Data Preparation for Rows
This group is for data preparation techniques that add or remove rows of data.
In machine learning, rows are often referred to as samples, examples, or instances.
These techniques are often used to augment a limited training dataset or to remove errors or ambiguity from the dataset.
The main class of techniques that come to mind are data preparation techniques that are often used for imbalanced classification.
This includes techniques such as SMOTE that create synthetic rows of training data for under-represented classes and random undersampling that remove examples for over-represented classes.
For more on SMOTE data sampling, see the tutorial:

It also includes more advanced combined over- and undersampling techniques that attempt to identify and remove ambiguous examples along the decision boundary of a classification problem and remove them or change their class label.
For more on these types of data preparation, see the tutorial:

This class of data preparation techniques also includes algorithms for identifying and removing outliers from the data. These are rows of data that may be far from the center of probability mass in the dataset and, in turn, may be unrepresentative of the data from the domain.
For more on outlier detection and removal methods, see the tutorial:

  • How to Remove Outliers for Machine Learning

Yüklə 60,65 Kb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2024
rəhbərliyinə müraciət

    Ana səhifə