Search This Blog

Tuesday 13 September 2011

Variables and Parameters in an Incremental Strategy

Variables and Parameters in an Incremental Strategy

Variables and parameters can enhance incremental strategies. The following example uses a mapping variable, an expression transformation object, and a parameter file for restarting.

Scenario

Company X wants to start with an initial load of all data, but wants subsequent process runs to select only new information. The environment data has an inherent Post_Date that is defined within a column named Date_Entered that can be used. Process will run once every twenty-four hours.

Solution

Create a mapping with source and target objects. From the menu create a new mapping variable named $$Post_Date with the following attributes:
  • TYPE Variable
  • DATATYPE Date/Time
  • AGGREGATION TYPE MAX
  • INITIAL VALUE 01/01/1900

 Note that there is no need to encapsulate the INITIAL VALUE with quotation marks. However, if this value is used within the Source Qualifier SQL, it is necessary to use the native RDBMS function to convert (e.g., TO DATE(--,--)).  Within the Source Qualifier Transformation, use the following in the Source_Filter Attribute: DATE_ENTERED > to_Date(' $$Post_Date','MM/DD/YYYY HH24:MI:SS')
Also note that the initial value 01/01/1900 will be expanded by the PowerCenter Server to 01/01/1900 00:00:00, hence the need to convert the parameter to a datetime.
The next step is to $$Post_Date and Date_Entered to an Expression transformation. This is where the function for setting the variable will reside. An output port named Post_Date is created with data type of date/time. In the expression code section, place the following function:
SETMAXVARIABLE($$Post_Date,DATE_ENTERED)

The function evaluates each value for DATE_ENTERED and updates the variable with the Max value to be passed forward.  For example:

DATE_ENTEREDResultant POST_DATE
9/1/20009/1/2000
10/30/200110/30/2001
9/2/200010/30/2001


Consider the following with regard to the functionality:
  1. In order for the function to assign a value and ultimately store it in the repository, the port must be connected to a downstream object. It need not go to the target, but it must go to another Expression Transformation. The reason is that that memory will not be instantiated unless it is used in a downstream transformation object.
  2. In order for the function to work correctly, the rows have to be marked for insert. If the mapping is an update only mapping (i.e., Treat Rows As is set to Update in the session properties) the function will not work. In this case, make the session Data Driven and add an Update Strategy after the transformation containing the SETMAXVARIABLE function, but before the Target.
  3. If the intent is to store the original Date_Entered per row and not the evaluated date value, then add an ORDER BY clause to the Source Qualifier. That way the dates are processed and set in order and data is preserved.

The first time this mapping is run the SQL will select from the source where Date_Entered is > 01/01/1900 providing an initial load. As data flows through the mapping, the variable gets updated to the Max Date_Entered it encounters. Upon successful completion of the session, the variable is updated in the Repository for use in the next session run. To view the current value for a particular variable associated with the session, right-click on the session and choose View Persistent Values.
The following graphic shows that after the initial run, the Max Date_Entered was 02/03/1998. The next time this session is run, based on the variable in the Source Qualifier Filter, only sources where Date_Entered > 02/03/1998 will be processed.

Resetting or Overriding Persistent Values

To reset the persistent value to the initial value declared in the mapping, view the persistent value from Server Manager (see graphic above) and press Delete Values. This will delete the stored value from the Repository, causing the Order of Evaluation to use the Initial Value declared from the mapping.
If a session run is needed for a specific date, use a parameter file. There are two basic ways to accomplish this:
  • Create a generic parameter file, place it on the server, and point all sessions to that parameter file. A session may (or may not) have a variable, and the parameter file need not have variables and parameters defined for every session using the parameter file. To override the variable, either change, uncomment, or delete the variable in the parameter file.
Run PMCMD for that session but declare the specific parameter file within the PMCMD command.

1 comment:

  1. As opposed to incremental strategy, is there a subtraction-based one? I had some regression concepts introduced in statistics classes in college, but that never got clear enough… Now I'm going thru academy.vertabelo.com
    course in SQL basics and I'm wondering are there any common points.

    ReplyDelete