The EMCLUS Procedure
Output from PROC EMCLUS
The beginning of the output shows the initial model parameter estimates. Next, the estimated model
parameters, sample means, and sample variances for the active primary clusters are displayed. In active
clusters are shown with missing values. The sample mean and variance are calculated from the
observations that are summarized in the primary clusters.
In the cluster summary table, the following statistics are listed:
Current Frequency
the number of observations that are summarized in a cluster during the current iteration.
Total Frequency
the cumulative sum of the current frequencies for each cluster.
Proportion of Data Summarized
the total frequency divided by the Obs read in.
Nearest Cluster
the closest primary cluster to a primary cluster based on the euclidean distance between the
estimated mean of the two primary clusters.
Distance
the euclidean distance of a primary cluster to its nearest cluster.
The iteration summary table displays:
Log-likelihood
the average log-likehood over all the observations that are read in.
Obs read in this iteration
the number of observations that are read in at current iteration.
Obs read in
the cumulative sum of observations that are read in.
Current Summarized
is the sum of the current frequencies across the primary clusters.
Total Summarized
is the sum of the total frequencies across the primary clusters.
Proportion Summarized
the Total Summarized divided by the Obs read in.
If there are secondary clusters, the sample mean, sample variance, and the number of observations in
secondary clusters are also displayed after the iteration summary table.
Note: The estimated variance parameter for each variable is bound from below by the value
(var)*(eps), where var is the sample variance of that variable obtained from the observations read in at
the first iteration, and eps is 10
--6
. Both the standard and scaled EM algorithm sometimes are slow to
convergence, however, the scaled EM algorithm generally runs faster than the standard EM algorithm.
Convergence may be sped up by increasing p and /or eps, or by using the CLEAR option. Changing
these values may alter the parameter estimates.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The EMCLUS Procedure
Example
Example 1: Syntax for PROC FASTCLUS
Example 2: Use of the EMCLUS Procedure
Chapter Contents
Previous
Next
Top of Page
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The EMCLUS Procedure
Example 1: Syntax for PROC FASTCLUS
PROC EMCLUSlibref.SAS-data-set>
OUTSEEDS = libref.SAS-data-set
MAXCLUSTERS = positive integer;
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The EMCLUS Procedure
Example 2: Use of the EMCLUS Procedure
PROC FASTCLUS returns a portion of the total output summarized in the following table:
Cluster
Frequency
RMS
Std.
Deviation
1
500
101.34
2
3
1000.34
3
100
3.79
4
150
4.05
Clusters 1, 3, and 4 should be used as initial estimates for PROC EMCLUS, because of their high
frequency counts. Cluster 1 may actually be a group of clusters because of its high RMS Std. Deviation.
Therefore, the syntax when using PROC EMCLUS could look like:
PROC EMCLUS DATA=
CLUSTERS = 5
INIT = FASTCLUS
SEED =
INITSTD = 50.0
INITCLUS 1, 3, 4;
run;
Note that CLUSTERS is set to 5, but any integer greater than or equal to 3 is appropriate since there are
three clusters specified in the INITCLUS option. Also the INITSTD could have been set to any number
less than 101.34 and greater than 4.05.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
COMBINATION FUNCTIONS
Details
Examples
Example 1: Developing a Simple Multilayer Perceptron (Rings Data)
Example 2: Developing a Neural Network for a Continuous Target
Example 3: Neural Network Hill-and-Plateau Example (Surf Data)
References
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.