Automation of Data Mining Using Integration Services


Extract and edit the XMLA statement



Yüklə 145,11 Kb.
səhifə3/9
tarix08.10.2017
ölçüsü145,11 Kb.
#3798
1   2   3   4   5   6   7   8   9

Extract and edit the XMLA statement


Next, use the time series model from the previous step to extract an XMLA statement that will serve as a template for all other models.

You can get the XMLA statement for any model or structure by using the scripting options in SQL Server Management Studio:



  1. In SQL Server Management Studio, right-click the time series model.

  2. Click Script Mining Model as.

  3. Save the XMLA to a text file.

  4. Open the text file in Notepad or another plain-text editor.

After you generate XMLA for the default time series model by using the script option, it looks like the following code. (The XMLA statement for models can be lengthy, so only an excerpt is shown here.) The XMLA statement always includes the database, the mining structure, metadata such as the model name, and the algorithm used for analysis. It can optionally include multiple parameters.
Forecasting Models

Forecasting




ARIMA_1-10-30

ARIMA_1-10-30

Microsoft_Time_Series





FORECAST_METHOD

ARIMA





PERIODICITY_HINT

{1,10,30}









Next, make the following changes to the command text that you extracted:



  • Add the parameters that you want to change, if they are not already present in the model. Default parameters are part of the XMLA output, so if your base model does not contain any parameters, you will need to add the XMLA section that contains parameters.

  • Remove unnecessary white space and all line breaks. For this walkthrough, the XMLA is stored as a string in a variable, which cannot contain line breaks. If you leave in any line breaks, the problem is not detected at package validation, but at run time, the Analysis Services engine attempts to execute the XMLA and fails with an error.

To clean up the file, use your favorite text editor. White spaces such as tabs and multiple space characters are okay but you can remove them if you like, to shorten the string variable. There is no limit on the size of string variables, but there is a 4,000-character limit in the expression editor.

  1. If your model does not already contain the parameters FORECAST_METHOD and PERIODICITY_HINT, use the code listed earlier, and copy the XML node that begins with and ends with . Paste it into the text file containing the XMLA command, directly below the line that defines the algorithm, and before the section that defines the columns.

  2. Edit the entire XMLA statement to remove line breaks. You can use any text editor that you like, so long as you verify that the result is a single line of text.

Prepare the replacement parameters


To create new models, you must update the basic XMLA command that you just created by inserting different values for the parameters. Among the parameters you must update are the model names and the model ID. Before you do this, you may find it helpful to review the format of the parameters you will change:

  • FORECAST_METHOD – Can have the values MIXED (default), ARIMA, or ARTXP.

  • PERIODICITY_HINT – Can have any combination of numbers separated by commas and enclosed by curly braces.

  • MODEL_ID – Must be unique for each model you create or an error will be generated.

  • MODEL_NAME – Should match the MODEL_ID; optional, but having them match makes the process easier to understand.

Integration Services is extremely flexible, so there are many different ways to store the parameters and insert them into the model XMLA. For example, you could:

  • Store the parameters as text in a SQL Server table, and then insert them into the XMLA within a Foreach Loop, by using an ADO.NET iterator.

  • Save the XMLA command as a text file and read it using a flat file connection. Save the variables in a configuration file and apply them at run time.

  • Save the XMLA command as an .xml file, and then read it into a package by using an XML Source. Insert the variables into the XML by using the properties and methods of the XML task.

  • Create multiple XMLA files in advance and then read the files with a combination of a Foreach loop and an XML Source connection.

However, for this scenario, you need to be able to easily add new sets of parameters, and to view and update the complete list of models and parameters.

Therefore, you’ll use the first method: create the parameter-value pairs as records in a SQL Server database, and then read in the new values at run time by using a Foreach Loop container. This way, you can easily view or update the parameters by using SQL queries.

Run the following statement to create the parameters table.

USE [DMReporting] -- substitute your database name here

GO

/****** Object: Table [dbo].[ModelParameters] Script Date: 11/09/2010 10:56:26 ******/



SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON



GO

CREATE TABLE [dbo].[ModelParameters](

[RecordID] [int] IDENTITY(1,1) NOT NULL,

[RecordDate] [datetime] NULL,

[ModelID] [nvarchar](50) NULL,

[ModelName] [nvarchar](50) NULL,

[ForecastMethod] [nvarchar](50) NULL,

[PeriodicityHint] [nvarchar](50) NULL

) ON [PRIMARY]

GO

The following table lists the parameters that are used to build models in this walkthrough. Insert these values into the parameters table you created by using the script.



ModelID

ForecastMethod

PeriodicityHint

ARIMA_1-7-10

ARIMA

{1,7,10}

ARIMA_1-10-30

ARIMA

{1,10,30}

ARIMA_nohints

ARIMA

{1}

MIXED_1-7-10

MIXED

{1,7,10}

MIXED_1-10-30

MIXED

{1,10,30}

MIXED_nohints

MIXED

{1}

ARTXP_1-7-10

ARTXP

{1,7,10}

ARTXP_1-10-30

ARTXP

{1,10,30}

ARTXP_nohints

ARTXP

{1}

This scenario uses the parameters FORECAST_METHOD and PERIODICITY_HINT because they are among the most important parameters for time series models (also because they are string values and easy to change!).

However, the parameters that you change will be completely different for other algorithms. For example, if you build a clustering model, you might decide to change the CLUSTERING_METHOD parameter and build models using each of the four clustering methods, such as K-Means. You might also try altering the MINIMUM_SUPPORT parameter, or trying a variety of cluster seeds. For a list of the parameters provided by the different algorithms, see the algorithm technical reference topics (http://msdn.microsoft.com/en-us/library/cc280427.aspx) in MSDN.



Important note for data miners: Altering parameter values can strongly affect the model results. Therefore, you should have some sort of plan for analyzing the results and weeding out badly fit models.

For example, because the time series algorithm is very sensitive to periodicity hints, it can produce poor results if you provide the wrong hint. If you specify that the data contains weekly cycles and it actually contains monthly cycles, the algorithm attempts to fit the data to the suggested weekly cycle and might produce odd results. Some of the models generated by this automation process demonstrate this behavior.

There are many ways that you can check the validity of models:


  • Use descriptive statistics and metadata for the individual models to eliminate models that have characteristics of overfitting or poor fit.

  • Validate data sets and models by using cross-validation or one of the other accuracy measures provided by SQL Server. For more information, see Validating Data Mining Models (http://technet.microsoft.com/en-us/library/ms174493.aspx) in SQL Server Books Online.

  • Choose only parameters and values that make sense for the business problem; use business rules to guide your modeling.

This completes the preparations, and you can now build the three packages.

The instructions for each package begin with a diagram that illustrates the package workflow and briefly describes the package components.

The diagram is followed by steps that you can follow to configure each task or destination.


Yüklə 145,11 Kb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə