
Preparation techniques plan: framework for data preparation techniques in machine learning challenge of data preparationData Preparation for Columns + Values

səhifə  6/8  tarix  30.12.2023  ölçüsü  60,65 Kb.   #164088 
 PREPARATION TECHNIQUESData Preparation for Columns + Values
This group is for data preparation techniques that change both the number of columns and the values in the data.
The main class of techniques that this brings to mind are dimensionality reduction techniques that specifically reduce the number of columns and the scale and distribution of numerical input variables.
This includes matrix factorization methods used in linear algebra as well as manifold learning algorithms used in highdimensional statistics.
For more information on these techniques, see the tutorial:

Introduction to Dimensionality Reduction for Machine Learning
Although these techniques are designed to create projections of rows in a lowerdimensional space, perhaps this also leaves the door open to techniques that do the inverse. That is, use all or a subset of the input variables to create a projection into a higherdimensional space, perhaps decompiling complex nonlinear relationships.
Perhaps polynomial transforms where the results replace the raw dataset would fit into this class of data preparation methods.
Let me know in the comments below.
Data Preparation for Rows + Values
This group is for data preparation techniques that change both the number of rows and the values in the data.
I have not explicitly considered data transforms of this type before, but it falls out of the framework as defined.
A group of methods that come to mind are clustering algorithms where all or subsets of rows of data in the dataset are replaced with data samples at the cluster centers, referred to as cluster centroids.
Related might be replacing rows with exemplars (aggregates of rows) taken from specific machine learning algorithms, such as support vectors from a support vector machine, or the codebook vectors taken from a learning vector quantization.
Naturally, these aggregate rows are simply added to the dataset rather than replacing rows, then they would naturally fit into the “Data Preparation for Rows” group described above.
Dostları ilə paylaş: 

