GLOBAL = NORMAL|GFIDF|IDF|ENTROPY
Specifies a global weight, G
i
, to be used to weight the entries of the input matrix prior to any
calculations. If the WGT = option is specified, the weighted matrix will be written out. Local and
global weights are combined so that an entry,
, of the new matrix is calculated from an entry,
a
ij
, of the old matrix as
. If the
local weight is not specified, it defaults to
. If a global weight is not specified, it defaults to
. The GLOBAL
option may not be used in conjunction with the IN_GLOBALoption. The GWGT option on the
OUTPUT statement enables you to save the calculated global weights so they can be applied to
subsequent data sets by using the IN_GLOBAL option.
Global weights are functions of the row entries of the original, noncompressed, sparse matrix. The
following table lists the available global weights:
Table of Row Weight
Normal
Global
Frequency
divided
by Inverse
Document
Frequency
(GFIDF)
Inverse
Document
Frequency
(IDF)
Entropy
where
f
ij
is the frequency of term i in document j, d
i
is the number of documents in which term i
appears, g
i
is the number of times that term i appears in the whole document collection, n is the
number of document in the collection, and
.
TOL = number
Specifies a tolerance for the procedure to stop finding eigenvalues of A
T
A. The procedure is
actually finding eigenvalues of
A
T
A. Suppose is the eigenvalue estimate and y is the eigenvector
estimate, then the procedure terminates when all k sets of values satisfy
. If TOL is not specified, it defaults to 10
--6
, which is more
than adequate for most text mining problems.
The SPSVD Procedure
ROW Statement
Specifies the row variable. This statement is not required if the row variable has a name of ROW.
ROW variable;
variable
Specifies the name of the variable in the input data set that contains the row variable for the
compressed matrix format as described in the overview.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The SPSVD Procedure
COL Statement
Specifies the row variable. This statement is not required if the column variable has a name of
COL.
COL variable;
variable
Specifies the name of the variable in the input data set that contains the column variable for the
compressed matrix format as described in the overview.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The SPSVD Procedure
ENTRY Statement
Specifies the variable name of the entry values. This statement is not required if the variable has a
name of ENTRY.
ENTRY variable;
variable
Specifies the name of the variable in the input data set that contains the entry values for the
compressed matrix format as described in the overview.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The SPSVD Procedure
OUTPUT Statement
Specifies the data sets to be output.
OUTPUT <
option(s)>;
Options
S = SAS-data-set
Specifies the name of the data set to store the calculated
matrix. The matrix is written with rows
of the matrix as observations in the SAS data set and columns as variables. The variables are
named COL1-COLk. You can not specify S = if the IN_S option has been specified.
U = SAS-data-set
Specifies the name of the data set to store the calculated
matrix. The matrix is written with
rows of the matrix as observations in the SAS data set and columns as variables. The variables are
named COL1-COLk. You can not specify U = if the IN_U option has been specified.
V = SAS-data-set
Specifies the name of the data set to store the calculated
matrix. The matrix is written with
rows of the matrix as observations in the SAS data set and columns as variables. The variables are
named COL1-COLk. You can not specify V = if the IN_V option has been specified.
GWGT = SAS-data-set
Specifies the name of the data set that contains the calculated global weights. This data set can be
applied to other data sets by using the IN_GLOBAL option. This option must be used in
conjunction with the GLOBAL option.
WGT = SAS-data-set
Specifies the name of the data set to which the procedure writes the weighted matrix, if the
LOCAL, GLOBAL, or both statements are used. If LOCAL and /or GLOBAL is specified, but
WGT= is not, then all calculations performed by the procedure are still based on the weighted
matrix; but the weighted matrix will not be saved. If WGT= is specified and COLPRO,
ROWPRO, U=, S=, and V= are not specified, then the matrix will be weighted and written to disk;
no other calculations will be performed.
COLPRO|DOCPRO = SAS-data-set
Specifies the data set that the projection of the columns of the input matrix onto the columns of the
matrix
will be written to. If the IN_U option is specified, the data in the set specified by the
IN_U option will be used for the projection. Otherwise U will be calculated from the input data
set. If SCALECOL|SCALEDOC or SCALEALL is specified and IN_U is specified, then IN_S
must also be specified.