Preparation techniques plan: framework for data preparation techniques in machine learning challenge of data preparation

Yüklə 60,65 Kb.
ölçüsü60,65 Kb.
1   2   3   4   5   6   7   8

Data Preparation for Values
This group is for data preparation techniques that change the raw values in the data.
These techniques are often required to meet the expectations or requirements of specific machine learning algorithms.
The main class of techniques that come to mind is data transforms that change the scale or distribution of input variables.
For example, data transforms such as standardization and normalization change the scale of numeric input variables. Data transforms like ordinal encoding change the type of categorical input variables.
There are also many data transforms for changing the distribution of input variables.
For example, discretization or binning change the distribution of numerical input variables into categorical variables with an ordinal ranking.
For more on this type of data transform, see the tutorial:

The power transform can be used to change the distribution of data to remove a skew and make the distribution more normal (Gaussian).
For more on this method, see the tutorial:

  • How to Use Power Transforms for Machine Learning

The quantile transform is a flexible type of data preparation technique that can map a numerical input variable or to different types of distributions such as normal or Gaussian.
You can learn more about this data preparation technique here:

  • How to Use Quantile Transforms for Machine Learning

Another type of data preparation technique that belongs to this group are methods that systematically change values in the dataset.
This includes techniques that identify and replace missing values, often referred to as missing value imputation. This can be achieved using statistical methods or more advanced model-based methods.
For more on these methods, see the tutorial:

  • Statistical Imputation for Missing Values in Machine Learning

All of the methods discussed could also be considered feature engineering methods (e.g. fitting into the previously discussed group of data preparation methods) if the results of the transforms are appended to the raw data as new columns.

Yüklə 60,65 Kb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2024
rəhbərliyinə müraciət

    Ana səhifə