Search This Blog

Friday, 2 December 2011

Informatica and Stored Procedures

A. Described below is a scenario where the requirement is to have a stored procedure that returns a cursor as a source. PowerCenter does not support a stored procedure that returns a cursor as a source. The workaround for this is
1. The procedure that will load the data to a new table:
CREATE OR REPLACE procedure load (p_initial_date in date, p_final_Date in date) as
str_load varchar2 (500);
str_clean varchar2 (500);
begin
str_clean:= ‘DELETE FROM EMP’;
str_load:= ‘INSERT INTO EMP select * from EMPLOYEE where DOJ between trunc
(p_initial_date) and trunc (p_final_Date) ‘;
execute immediate str_clean;
execute immediate str_load;
EXCEPTION
WHEN OTHERS
THEN
ROLLBACK;
end load;

2. Create the table that will receive the data from the procedure:
SQL> create table EMP as SELECT * from EMPLOYEE where 1 > 2;
3. Add a Store Procedure transformation to the PowerCenter mapping. This transformation will execute this new procedure called as LOAD on this example.
4. Set the run method to be Source Pre Load, to be executed before read the source table.
5. Import the EMP table as a Source Definition. This table will be populated by the new Store Procedure.
A. If the original store procedure is used by the customer application and you can’t change the source code, you can create a new store procedure that call the original one (without inserting into a table), and execute the insert on the new table executing a loop on the returned cursor.
B. Given below is a situation where you wanted to pass a mapping variable to a stored procedure transformation (it can either be connected or unconnected).
Connected Stored Procedure
The parameters that are passed to a connected Stored Procedure have to be linked from another transformation.
Given below are the steps to pass mapping variable to a connected Stored Procedure transformation:
  1. Create an Expression transformation.
  2. Create an output port in the Expression transformation with the following expression:
$$mapping_variable
This sets the value of this output port to the mapping variable.
  1. Link this output port to the Stored Procedure transformation.
Unconnected Stored Procedure
For unconnected Stored Procedure transformations you can use the mapping variable in the expression calling the stored procedure.
Follow the steps below to pass mapping variable to a unconnected Stored Procedure transformation:
  1. Create an Expression transformation.
  2. Create an output port in the Expression transformation with the following expression:
: SP.GET_NAME_FROM_ID ($$mapping_variable, PROC_RESULT)

In case if you are attempting to use a mapping variable to store the output value of the stored procedure, the session will fail with the below error. “TE_7002 Transformation Parse Fatal Error; transformation stopped: invalid function reference. Failed to Initialize Server Transformation.”

To resolve the issue replace the mapping variable with the PROC_RESULT system variable.
Example:
--Incorrect, using a mapping variable:
:SP.PROCEDURE(FIELD1, $$mapping_variable)
Correct, using the PROC_RESULT system variable:
:SP.PROCEDURE(FIELD1,PROC_RESULT)
Or
:SP.PROCEDURE($$mapping_variable,PROC_RESULT)
The PROC_RESULT system variable assigns the stored procedure output to the port with this expression.
You are welcome to share your thoughts/suggestions to inflate any “Infa Tip” -

Monday, 14 November 2011

Process Control / Audit of Workflows in Informatica

1. Process Control – Definition
Process control or Auditing of a workflow in an Informatica is capturing the job information like start time, end time, read count, insert count, update count and delete count. This information is captured and written into table as the workflow executes
2. Structure of Process Control/Audit table
The table structure of process control table is given below,
Process Control structure

PROCESS_RUN_ID
Number(p,s)
11
A unique number used to identify a specific process run.
PROCESS_NME
Varchar2
120
The name of the process (this column will be populated with the names of the Informatica mappings.)
START_TMST
Date
19
The date/time when the process started.
END_TMST
Date
19
The date/time when the process ended.
ROW_READ_CNT
Number(p,s)
16
The number of rows read by the process.
ROW_INSERT_CNT
Number(p,s)
16
The number of rows inserted by the process.
ROW_UPDATE_CNT
Number(p,s)
16
The number of rows updated by the process.
ROW_DELETE_CNT
Number(p,s)
16
The number of rows deleted by the process
ROW_REJECT_CNT
Number(p,s)
16
The number of rows rejected by the process.
USER_ID
Varchar2
32
The etl user identifier associated with the process.

3.  Mapping Logic and Build Steps
The process control flow has two data flows, one is an insert flow and the other is an update flow. The insert flow runs before the main mapping and update flows runs after the main mapping, this option is chosen in “Target Load Plan”. The source for both the flows could be a dummy source which will return one record as output, for example select ‘process’ from dual or select count(1) from above Table. The following list of mapping variable is to be created,


Mapping Parameter and variables

$$PROCESS_ID
$$PROCESS_NAME
$$INSERT_COUNT
$$UPDATE_COUNT
$$DELETE_COUNT
$$REJECT_COUNT




Steps to create Insert flow:
  • 1. Have “select ‘process’ from dual” as Sequel in source qualifier
  • 2. Have a sequence generator to create running process_run_Id ’s
  • 3. In an expression SetVariable ($$PROCESS_RUN_ID,NEXTVAL), $$PROCESS_NAME to o_process_name, a output only field
  • 4. In an expression assign $$SessionStarttime to o_Starttime, an output only field
  • 5. In an expression accept the sequence id from sequence generator
  • 6. Insert into target’ process control table’ with all the above three values
Process Control Image after Insert flow

PROCESS_RUN_ID
1
PROCESS_NME
VENDOR_DIM_LOAD
START_TMST
8/23/2009 12:23
END_TMST
ROW_READ_CNT
ROW_INSERT_CNT
ROW_UPDATE_CNT
ROW_DELETE_CNT
ROW_REJECT_CNT
USER_ID
INFA8USER

Steps in main mapping,
  • 1. After the source qualifier, increment the read count in a variable (v_read_count) for each record been read in an expression and SetMaxVariable ($$READ_COUNT,v_read_count)
  • 2. Before the update strategy of target instances, do the same for Insert/Update/Delete counts; all the variables are now set with all their respective counts


Steps to create Update flow:
  • 1. Have “select ‘process’ from dual” as Sequel in source qualifier
  • 2. Use SetMaxvariable to get the process_run_id created in insert flow
  • 3. In an expression assign $$INSERT_COUNT to an o_insert_count, a output only field, assign all the counts in the same way
  • 4. In an expression assign $$SessionEndtime to o_Endtime, an output only field
  • 5. Update the target ‘Process Control Table’ with all the above three values where process_run_id equals the process_run_id generated in Insert flow
Process Control Image after Update flow

PROCESS_RUN_ID
1
PROCESS_NME
VENDOR_DIM_LOAD
START_TMST
8/23/2009 12:23
END_TMST
8/23/2009 12:30
ROW_READ_CNT
1000
ROW_INSERT_CNT
900
ROW_UPDATE_CNT
60
ROW_DELETE_CNT
40
ROW_REJECT_CNT
0
USER_ID
INFA8USER

4. Merits over Informatica Metadata
This information is also available in Informatica metadata, however maintaining this within our system has following benefits,
  • Need not write complex query to bring in the data from metadata tables
  • Job names need not be mapping names and can be user friendly names
  • Insert/Delete/Update counts of all as well as individual target can be audited
  • This audit information can be maintained outside the metadata security level and can be used by other mappings in their transformations
  • Can be used by mappings that build parameter files
  • Can be used by mappings that govern data volume
  • Can be used by Production support to find out the quick status of load
You are welcome to share your thoughts/suggestions to inflate any “Infa Tip” -

crontab -e

bash
$crontab –l  > filename.txt

$export EDITOR=vi

crontab –e

Data Integration Challenge – Capturing Changes

When we receive the data from source systems, the data file will not carry a flag indicating whether the record provided is new or has it changed. We would need to build process to determine the changes and then push them to the target table.
There are two steps to it
  1. Pull the incremental data from the source file or table
  2. Process the pulled incremental data and determine the impact of it on the target table as Insert or Update or Delete

Step 1: Pull the incremental data from the source file or table

If source system has audit columns like date then we can find the new records else we will not be able to find the new records and have to consider the complete data
For source system’s file or table that has audit columns, we would follow the below steps
  1. While reading the source records for a day (session), find the maximum value of date(audit filed) and store in a persistent variable or a temporary table
  2. Use this persistent variable value as a filter in the next day to pull the incremental data from the source table

Step 2: Determine the impact of the record on target table as Insert/Update/ Delete

Following are the scenarios that we would face and the suggested approach
  1. Data file has only incremental data from Step 1 or the source itself provide only incremental data
    • do a lookup on the target table and determine whether it’s a new record or an existing record
    • if an existing record then compare the required fields to determine whether it’s an updated record
    • have a process to find the aged records in the target table and do a cleanup for ‘deletes’
  2. Data file has full complete data because no audit columns are present
    • The data is of higher
      • have a back up of the previously received file
      • perform a comparison of the current file and prior file; create a ‘change file’ by determining the inserts, updates and deletes. Ensure both the ‘current’ and ‘prior’ file are sorted by key fields
      • have a process that reads the ‘change file’ and loads the data into the target table
      • based on the ‘change file’ volume, we could decide whether to do a ‘truncate & load’
    • The data is of lower volume
      • do a lookup on the target table and determine whether it’s a new record or an existing record
      • if an existing record then compare the required fields to determine whether it’s an updated record
      • have a process to find the aged records in the target table and do a clean up or delete

You are welcome to share your thoughts/suggestions to inflate any “Infa Tip”

Monday, 31 October 2011

Concurrent Workflow Execution

What is concurrent work flow?
A concurrent workflow is a workflow that can run as multiple instances concurrently.
What is workflow instance?
A workflow instance is a representation of a workflow.

How to configure concurrent workflow?

1) Allow concurrent workflows with the same instance name:
Configure one workflow instance to run multiple times concurrently. Each instance has the same source, target, and variables parameters.
Eg: Create a workflow that reads data from a message queue that determines the source data and targets. You can run the instance multiple times concurrently and pass different connection parameters to the workflow instances from the message queue.
2) Configure unique workflow instances to run concurrently:
Define each workflow instance name and configure a workflow parameter file for the instance. You can define different sources, targets, and variables in the parameter file.

Eg: Configure workflow instances to run a workflow with different sources and targets. For example, your organization receives sales data from three divisions. You create a workflow that reads the sales data and writes it to the database. You configure three instances of the workflow. Each instance has a different workflow parameter file that defines which sales file to process. You can run all instances of the workflow concurrently.

How concurrent workflow Works?

A concurrent workflow group’s logical sessions and tasks together, like a sequential workflow, but runs all the tasks at one time.
Advantages of Concurrent workflow?
This can reduce the load times into the warehouse, taking advantage of hardware platforms’ Symmetric Multi-Processing (SMP) architecture.
LOAD SCENARIO:
Source table records count:  150,622,276

Friday, 21 October 2011

Impact Analysis on Source & Target Definition Changes

Impact Analysis on Source & Target Definition Changes

Changes to Source and Target definition will impact the current state of the Informatica mapping and below lists the possible changes at Source and the Target with impact.
Updating Source Definitions: When we update a source definition, the Designer propagates the changes to all mappings using that source. Some changes to source definitions can invalidate mappings.
Below table describes how the mappings get impacted when the source definition is edited:

Modification
Result  of the source after modifying the source definition
Add a column.
Mappings are not invalidated.
Change a column Data type.
Mappings may be invalidated. If the column is connected to an input port that uses a Data type incompatible with the new one, the mapping is invalidated.
Change a column name.
Mapping may be invalidated. If you change the column name for a column you just added, the mapping remains valid. If you change the column name for an existing column, the mapping is invalidated.
Delete a column.
Mappings can be invalidated if the mapping uses values from the deleted column.


Adding a new column in the existing source definition:

  • When we add a new column to a source in the Source Analyzer, all mappings using the source definition remain valid.
  • However, when we add a new column and change some of its properties, the Designer invalidates mappings using the source definition.
  • We can change the following properties for a newly added source column without invalidating a mapping: 
1. Name
2. Data type
3. Format
4. Usage
5. Redefines
6. Occurs
7. Key type
If the changes invalidate the mapping, we must open and edit the mapping. Then click Repository > Save to save the changes to the repository. If the invalidated mapping is used in a session, we must validate the session.
Updating Target Definitions:
When we change a target definition, the Designer propagates the changes to any mapping using that target. Some changes to target definitions can invalidate mappings.
The following table describes how the mappings get impacted when we edit target definitions:

Modification
Result  of the source after modifying the target definition
Add a column.
Mapping not invalidated.
Change a column Data type.
Mapping may be invalidated. If the column is connected to an input port that uses a Data type that is incompatible with the new one (for example, Decimal to Date), the mapping is invalid.
Change a column name.
Mapping may be invalidated. If you change the column name for a column you just added, the mapping remains valid. If you change the column name for an existing column, the mapping is invalidated.
Delete a column.
Mapping may be invalidated if the mapping uses values from the deleted column.
Change the target definition type.
Mapping not invalidated.


Adding a new column in the existing target definition:

  • When we add a new column to a target in the Target Designer, all mappings using the target definition remain valid.
  • However, when you add a new column and change some of its properties, the Designer invalidates mappings using the target definition.
  • We can change the following properties for a newly added target column without invalidating a mapping:
1. Name
2. Data type
3. Format

If the changes invalidate the mapping, validate the mapping and any session using the mapping. We can validate objects from the Query Results or View Dependencies window or from the Repository Navigator. We can validate multiple objects from these locations without opening them in the workspace. If we cannot validate the mapping or session from one of these locations, open the object in the workspace and edit it.
Re-importing a Relational Target Definition:
If a target table changes, such as when we change a column data type, we can edit the definition or we can re-import the target definition. When we re-import the target, we can either replace the existing target definition or rename the new target definition to avoid a naming conflict with the existing target definition.
To re-import a target definition:
  • In the Target Designer, follow the same steps to import the target definition, and select the    Target to import. The Designer notifies us that a target definition with that name already exists in the repository. If we have multiple tables to import and replace, select apply to All Tables.
  • Click Rename, Replace, Skip, or Compare.
  • If we click Rename, enter the name of the target definition and click OK.
  • If we have a relational target definition and click Replace, specify whether we want to retain primary key-foreign key information and target descriptions
The following table describes the options available in the Table Exists dialog box when re-importing and replacing a relational target definition:

Option
Description
Apply to all Tables
Select this option to apply rename, replaces, or skips all tables in the folder.
Retain User-Defined PK-FK Relationships
Select this option to keep the primary key-foreign key relationships in the target definition being replaced. This option is disabled when the target definition is non-relational.
Retain User-Defined Descriptions
Select this option to retain the target description and column and port descriptions of the target definition being replaced.