**ROWPRO|WORDPRO = ***SAS-data-set*
Specifies the data set that the projection of the rows of the input matrix onto the rows of the matrix

will be written to. If the IN_V option is specified, the data in the set specified by the IN_V

option will be used for the projection. Otherwise V will be calculated from the input data set. If

SCALEROW|SCALEWORD or SCALEALL is specified and IN_V is specified, then IN_S must

also be specified.

**SCALECOL|SCALEDOC, SCALEROW|SCALEWORD, SCALEALL**
Requests that the associated projections (column, row, or all) be scaled by the inverse of the

singular values. SCALEALL specifies that both the document (column) and the word (row)

projections should be scaled. SCALECOL or SCALEDOC specifies that the document (columns

of the input matrix) projections be scaled. SCALEROW or SCALEWORD specifies that the term

(rows of the input matrix) projections to be scaled. If **p**

ij

is the i

th

coordinate of the projected

image of the j

th

document, then scaling replaces the formula

with

where

is the i

th

singular value (the i

th

entry on the diagonal of

).

Scaling has two functions. First, it puts more weight on those themes in a document that are

uncommon in the document collection. Second, if either the terms or documents, but not both, are

scaled, and both are placed in the same space then the terms and documents that are highly

associated are more likely to be near each other.

**NORMCOL|NORMDOC, NORROW|NORMWORD, NORMALL**

Requests to normalize the Euclidean length of the document (column), word (row) or both

projections. For example, if NORMCOL, NORMDOC, or NORMALL is specified, then each

observation in the data set specified by the DOCPRO option will have a length of 1. This is useful

because it bring documents with similar content but different lengths closer together. For most text

mining applications, NORMALL is suggested.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

*The SPSVD Procedure*
**Example**
**Example 1: Use the SPSVD procedure for training and validation**
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

*The SPSVD Procedure*
**Example 1: Use the SPSVD procedure for training**
**and validation**
Suppose there are two data set, SASUSER.TRAIN, and SASUSER.VALID produced by the **Text**

**Parsing** node in Enterprise Miner. You want to use SASUSER.TRAIN for training a predictive model

and the SASUSER.VALID for validation.

PROC SPSVD DATA=SASUSER.TRAIN K=200 P=50 LOCAL=LOG GLOBAL=ENTROPY;

ROW KEY;

COL DOC;

ENTRY COUNT;

OUTPUT U=SASUSER.U V=SASUSER.V S=SASUSER.S NORMALL SCALEALL

DOCPRO=SASUSER.TRAINDP GWGT=SASUSER.WEIGHTS;

RUN;

PROC SPSVD DATA=SASUSER.VALID IN_U=SASUSER.U IN_S=SASUSER.S

LOCAL=LOG IN_GLOBAL=SASUSER.WEIGHTS;

ROW KEY;

COL DOC;

ENTRY COUNT;

OUTPUT NORMALL SCALEALL DOCPRO=SASUSER.VALIDDP;

RUN;

The first PROC

statement applies a local log, global entropy weighting scheme to the training data set.

The ROW, COL, and ENTRY options specify the names given to these variables by the Text Parsing

node in Enterprise Miner. Once the procedure has weighted the matrix, it calculates 200 (specified in the

K= option) columns of

,

, and

based on this weighted matrix. The weighted training data set is

then projected onto the first 200 columns of

, scaled by the inverse singular values and its length is

normalized. The result of the projection is written to SASUSER.TRAINDP. The calculated global

weights (entropy in this case) are saved to the data set SASUSER.WEIGHTS.

The second PROC statement is to project the validation data set using the calculations from the training

data set. This is done by specifying the

and

matrices calculated in the first PROC step with the

IN_U and IN_S options. Notice that you do not need to specify the V data set since you are not

projecting the terms. To project the document in the validation data set, specify the same local weighting

option for the validation data set and pass the calculated global weights via the IN_GLOBAL option.

Then, request that the normalized, scaled projection be written to SASUSER.VALIDDP. This way the

validation data set is weighted in exactly the same way as the training data set. Using the GLOBAL

option on the validation data set would cause new global weights to be calculated based on the data in

this set, which is not appropriate in this example because you want each dimension in the validation data

set to correspond to a dimension in the training data set.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

*The STDIZE Procedure*
**The STDIZE Procedure**
**Overview**
**Procedure Syntax**
PROC STDIZE Statement

BY Statement

FREQ Statement

LOCATION Statement

SCALE Statement

VAR Statement

WEIGHT Statement

**Details**
**Examples**
Example 1: Getting Started with the STDIZE Procedure

Example 2: Unstandardizing a Data Set

Example 3: Replacing Missing Values with Standardizing

Example 4: Replacing Missing Values without Standardizing the Variables

**References**

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.