|
Preparation techniques plan: framework for data preparation techniques in machine learning challenge of data preparation
|
səhifə | 4/8 | tarix | 30.12.2023 | ölçüsü | 60,65 Kb. | | #164088 |
| PREPARATION TECHNIQUESData Preparation for Columns
This group is for data preparation techniques that add or remove columns of data.
In machine learning, columns are often referred to as variables or features.
These techniques are often required to either reduce the complexity (dimensionality) of a prediction problem or to unpack compound input variables or complex interactions between features.
The main class of techniques that come to mind are feature selection techniques.
This includes techniques that use statistics to score the relevance of input variables to the target variable based on the data type of each.
For more on these types of data preparation techniques, see the tutorial:
How to Choose a Feature Selection Method for Machine Learning
This also includes feature selection techniques that systematically test the impact of different combinations of input variables on the predictive skill of a machine learning model.
For more on these types of methods, see the tutorial:
Recursive Feature Elimination (RFE) for Feature Selection in Python
Related are techniques that use a model to score the importance of input features based on their use by a predictive model, referred to as feature importance methods. These methods are often used for data interpretation, although they can also be used for feature selection.
For more on these types of methods, see the tutorial:
How to Calculate Feature Importance With Python
This group of methods also brings to mind techniques for creating or deriving new columns of data, new features. These are often referred to as feature engineering, although sometimes the whole field of data preparation is referred to as feature engineering.
For example, new features that represent values raised to exponents or multiplicative combinations of features can be created and added to the dataset as new columns.
For more on these types of data preparation techniques, see the tutorial:
How to Use Polynomial Feature Transforms for Machine Learning
This might also include data transforms that change a variable type, such as creating dummy variables for a categorical variable, often referred to as a one-hot encoding.
For more on these types of data preparation techniques, see the tutorial:
Ordinal and One-Hot Encodings for Categorical Data
Dostları ilə paylaş: |
|
|