The arboretum procedure



Yüklə 3.07 Mb.

səhifə85/148
tarix30.04.2018
ölçüsü3.07 Mb.
1   ...   81   82   83   84   85   86   87   88   ...   148
: documentation
documentation -> From cyber-crime to insider trading, digital investigators are increasingly being asked to
documentation -> EnCase Forensic Transform Your Investigations
documentation -> File Sharing Documentation Prepared by Alan Halter Created: 1/7/2016 Modified: 1/7/2016
documentation -> Gaia Data Release 1 Documentation release 0

The EMCLUS Procedure

Output from PROC EMCLUS

The beginning of the output shows the initial model parameter estimates. Next, the estimated model

parameters, sample means, and sample variances for the active primary clusters are displayed. In active

clusters are shown with missing values. The sample mean and variance are calculated from the

observations that are summarized in the primary clusters.

In the cluster summary table, the following statistics are listed:



Current Frequency

the number of observations that are summarized in a cluster during the current iteration.



Total Frequency

the cumulative sum of the current frequencies for each cluster.



Proportion of Data Summarized

the total frequency divided by the Obs read in.



Nearest Cluster

the closest primary cluster to a primary cluster based on the euclidean distance between the

estimated mean of the two primary clusters.

Distance

the euclidean distance of a primary cluster to its nearest cluster.

The iteration summary table displays:

Log-likelihood

the average log-likehood over all the observations that are read in.



Obs read in this iteration

the number of observations that are read in at current iteration.



Obs read in

the cumulative sum of observations that are read in.



Current Summarized

is the sum of the current frequencies across the primary clusters.



Total Summarized

is the sum of the total frequencies across the primary clusters.



Proportion Summarized

the Total Summarized divided by the Obs read in.

If there are secondary clusters, the sample mean, sample variance, and the number of observations in

secondary clusters are also displayed after the iteration summary table.



Note:   The estimated variance parameter for each variable is bound from below by the value


(var)*(eps), where var is the sample variance of that variable obtained from the observations read in at

the first iteration, and eps is 10

--6

. Both the standard and scaled EM algorithm sometimes are slow to



convergence, however, the scaled EM algorithm generally runs faster than the standard EM algorithm.

Convergence may be sped up by increasing p and /or eps, or by using the CLEAR option. Changing

these values may alter the parameter estimates.  

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The EMCLUS Procedure

Example

Example 1: Syntax for PROC FASTCLUS

Example 2: Use of the EMCLUS Procedure

Chapter Contents

Previous

Next


Top of Page

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The EMCLUS Procedure

Example 1: Syntax for PROC FASTCLUS

PROC EMCLUSlibref.SAS-data-set>

OUTSEEDS = libref.SAS-data-set

MAXCLUSTERS = positive integer;

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The EMCLUS Procedure

Example 2: Use of the EMCLUS Procedure

PROC FASTCLUS returns a portion of the total output summarized in the following table:

Cluster

Frequency



RMS

Std.


Deviation

1

500



101.34

2

3



1000.34

3

100



3.79

4

150



4.05

Clusters 1, 3, and 4 should be used as initial estimates for PROC EMCLUS, because of their high

frequency counts. Cluster 1 may actually be a group of clusters because of its high RMS Std. Deviation.

Therefore, the syntax when using PROC EMCLUS could look like:

PROC EMCLUS DATA=

     CLUSTERS = 5

     INIT = FASTCLUS

     SEED =

     INITSTD = 50.0

     INITCLUS 1, 3, 4;

run;

Note that CLUSTERS is set to 5, but any integer greater than or equal to 3 is appropriate since there are



three clusters specified in the INITCLUS option. Also the INITSTD could have been set to any number

less than 101.34 and greater than 4.05.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



The NEURAL Procedure

The NEURAL Procedure

Overview

Procedure Syntax

PROC NEURAL Statement

ARCHITECTURE Statement

CODE Statement

CONNECT Statement

CUT Statement

DECISION Statement

DELETE Statement

FREEZE Statement

FREQ Statement

HIDDEN Statement

INITIAL Statement

INPUT Statement

NETOPTIONS Statement

NLOPTIONS Statement

PERTURB Statement

PRELIM Statement

QUIT Statement

RANOPTIONS Statement

SAVE Statement

SCORE Statement

SET Statement

SHOW Statement

TARGET Statement

THAW Statement

TRAIN Statement

USE Statement

ACTIVATION FUNCTIONS




COMBINATION FUNCTIONS

Details

Examples

Example 1: Developing a Simple Multilayer Perceptron (Rings Data)

Example 2: Developing a Neural Network for a Continuous Target

Example 3: Neural Network Hill-and-Plateau Example (Surf Data)



References

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.





Dostları ilə paylaş:
1   ...   81   82   83   84   85   86   87   88   ...   148


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2017
rəhbərliyinə müraciət

    Ana səhifə