Search This Blog

Monday, 31 October 2011

Concurrent Workflow Execution

What is concurrent work flow?
A concurrent workflow is a workflow that can run as multiple instances concurrently.
What is workflow instance?
A workflow instance is a representation of a workflow.

How to configure concurrent workflow?

1) Allow concurrent workflows with the same instance name:
Configure one workflow instance to run multiple times concurrently. Each instance has the same source, target, and variables parameters.
Eg: Create a workflow that reads data from a message queue that determines the source data and targets. You can run the instance multiple times concurrently and pass different connection parameters to the workflow instances from the message queue.
2) Configure unique workflow instances to run concurrently:
Define each workflow instance name and configure a workflow parameter file for the instance. You can define different sources, targets, and variables in the parameter file.

Eg: Configure workflow instances to run a workflow with different sources and targets. For example, your organization receives sales data from three divisions. You create a workflow that reads the sales data and writes it to the database. You configure three instances of the workflow. Each instance has a different workflow parameter file that defines which sales file to process. You can run all instances of the workflow concurrently.

How concurrent workflow Works?

A concurrent workflow group’s logical sessions and tasks together, like a sequential workflow, but runs all the tasks at one time.
Advantages of Concurrent workflow?
This can reduce the load times into the warehouse, taking advantage of hardware platforms’ Symmetric Multi-Processing (SMP) architecture.
LOAD SCENARIO:
Source table records count:  150,622,276

Friday, 21 October 2011

Impact Analysis on Source & Target Definition Changes

Impact Analysis on Source & Target Definition Changes

Changes to Source and Target definition will impact the current state of the Informatica mapping and below lists the possible changes at Source and the Target with impact.
Updating Source Definitions: When we update a source definition, the Designer propagates the changes to all mappings using that source. Some changes to source definitions can invalidate mappings.
Below table describes how the mappings get impacted when the source definition is edited:

Modification
Result  of the source after modifying the source definition
Add a column.
Mappings are not invalidated.
Change a column Data type.
Mappings may be invalidated. If the column is connected to an input port that uses a Data type incompatible with the new one, the mapping is invalidated.
Change a column name.
Mapping may be invalidated. If you change the column name for a column you just added, the mapping remains valid. If you change the column name for an existing column, the mapping is invalidated.
Delete a column.
Mappings can be invalidated if the mapping uses values from the deleted column.


Adding a new column in the existing source definition:

  • When we add a new column to a source in the Source Analyzer, all mappings using the source definition remain valid.
  • However, when we add a new column and change some of its properties, the Designer invalidates mappings using the source definition.
  • We can change the following properties for a newly added source column without invalidating a mapping: 
1. Name
2. Data type
3. Format
4. Usage
5. Redefines
6. Occurs
7. Key type
If the changes invalidate the mapping, we must open and edit the mapping. Then click Repository > Save to save the changes to the repository. If the invalidated mapping is used in a session, we must validate the session.
Updating Target Definitions:
When we change a target definition, the Designer propagates the changes to any mapping using that target. Some changes to target definitions can invalidate mappings.
The following table describes how the mappings get impacted when we edit target definitions:

Modification
Result  of the source after modifying the target definition
Add a column.
Mapping not invalidated.
Change a column Data type.
Mapping may be invalidated. If the column is connected to an input port that uses a Data type that is incompatible with the new one (for example, Decimal to Date), the mapping is invalid.
Change a column name.
Mapping may be invalidated. If you change the column name for a column you just added, the mapping remains valid. If you change the column name for an existing column, the mapping is invalidated.
Delete a column.
Mapping may be invalidated if the mapping uses values from the deleted column.
Change the target definition type.
Mapping not invalidated.


Adding a new column in the existing target definition:

  • When we add a new column to a target in the Target Designer, all mappings using the target definition remain valid.
  • However, when you add a new column and change some of its properties, the Designer invalidates mappings using the target definition.
  • We can change the following properties for a newly added target column without invalidating a mapping:
1. Name
2. Data type
3. Format

If the changes invalidate the mapping, validate the mapping and any session using the mapping. We can validate objects from the Query Results or View Dependencies window or from the Repository Navigator. We can validate multiple objects from these locations without opening them in the workspace. If we cannot validate the mapping or session from one of these locations, open the object in the workspace and edit it.
Re-importing a Relational Target Definition:
If a target table changes, such as when we change a column data type, we can edit the definition or we can re-import the target definition. When we re-import the target, we can either replace the existing target definition or rename the new target definition to avoid a naming conflict with the existing target definition.
To re-import a target definition:
  • In the Target Designer, follow the same steps to import the target definition, and select the    Target to import. The Designer notifies us that a target definition with that name already exists in the repository. If we have multiple tables to import and replace, select apply to All Tables.
  • Click Rename, Replace, Skip, or Compare.
  • If we click Rename, enter the name of the target definition and click OK.
  • If we have a relational target definition and click Replace, specify whether we want to retain primary key-foreign key information and target descriptions
The following table describes the options available in the Table Exists dialog box when re-importing and replacing a relational target definition:

Option
Description
Apply to all Tables
Select this option to apply rename, replaces, or skips all tables in the folder.
Retain User-Defined PK-FK Relationships
Select this option to keep the primary key-foreign key relationships in the target definition being replaced. This option is disabled when the target definition is non-relational.
Retain User-Defined Descriptions
Select this option to retain the target description and column and port descriptions of the target definition being replaced.

Tuesday, 18 October 2011

Informatica Row Error Logging

Informatica Inbuilt Error Logging feature can be leveraged to implement Row Error logging in a central location.  When a row error occurs, the Integration service logs error information which can be used to determine the cause and source of the error.
These errors can be logged either in a relational table or in a flat file. When the error logging is enabled, the Integration service creates the error table or error log file the first time when it runs the session.  If the error table or error log file exists already, then the error data will be appended.
Following are the activities that need to be performed to implement the Informatica Row Error Logging:
1.  In the “Config object” tab of “Error Handling” option , set the “Error Log type” attribute to  “Relational database” or “Flat File”.  By Default error logging is disabled.
2.  SET Stop On Errors = 1
3.  If the  Error Log Type is set to “Relational”, specify the Database connection & Table Name Prefix
Following are the tables which will be created by Integration service and which will be populated as and when the error occurs.
PMERR_DATA
Stores data and metadata about a transformation row error and its corresponding source row.
PMERR_MSG
Stores metadata about an error and the error message.
PMERR_SESS
Stores metadata about the session.
PMERR_TRANS
Stores metadata about the source and transformation ports, such as name and datatype, when a transformation error occurs.
4.  If the  Error Log Type is set to “Flatfile”, specify the “Error log file directory” and “Error log file name”
Database Error Messages and the Error messages that Integration service writes to Bad File/ Reject file can also be captured and stored in the Error log tables / Flat files.
Following are the few database error messages which will be logged in the Error Log Tables / Flat files.

Error Messages
Cannot Insert the value NULL into column ‘<<Column name>>’, table ‘<<Table_name>>’
Violation of PRIMARY KEY constraint ‘<<Primary key constraint name>>’
Violation of UNIQUE KEY constraint ‘<<Unique Key Constraint>>’
Cannot Insert Duplicate key in object ‘<<Table_name>>’

Row Error Logging Implementation
Advantages
Since the Informatica Inbuilt feature is leveraged, the Error log information would be very accurate with very minimal development effort.
Pitfall
Enabling Error logging will have an impact to performance, since the integration service processes one row at a time instead of block of rows.

Tuesday, 11 October 2011

Transaction Control Transformation

Transaction Control Transformation
Power Center lets you control commit and roll back transactions based on a set of rows that pass through a Transaction Control transformation. A transaction is the set of rows bound by commit or roll back rows. You can define a transaction based on a varying number of input rows. You might want to define transactions based on a group of rows ordered on a common key, such as employee ID or order entry date.
In Power Center, you define transaction control at the following levels:
  • Within a mapping: Within a mapping, you use the Transaction Control transformation to define a transaction. You define transactions using an expression in a Transaction Control transformation. Based on the return value of the expression, you can choose to commit, roll back, or continue without any transaction changes.
  • Within a session: When you configure a session, you configure it for user-defined commit. You can choose to commit or roll back a transaction if the Integration Service fails to transform or write any row to the target.
When you run the session, the Integration Service evaluates the expression for each row that enters the transformation. When it evaluates a commit row, it commits all rows in the transaction to the target or targets. When the Integration Service evaluates a roll back row, it rolls back all rows in the transaction from the target or targets. If the mapping has a flat file target you can generate an output file each time the Integration Service starts a new transaction. You can dynamically name each target flat file.
Properties Tab:
On the Properties tab, you can configure the following properties:
  • Transaction control expression
  • Tracing level
Enter the transaction control expression in the Transaction Control Condition field. The transaction control expression uses the IIF function to test each row against the condition. Use the following syntax for the expression:
IIF (condition, value1, value2) :
The expression contains values that represent actions the Integration Service performs based on the return value of the condition. The Integration Service evaluates the condition on a row-by-row basis. The return value determines whether the Integration Service commits, rolls back, or makes no transaction changes to the row.
When the Integration Service issues a commit or roll back based on the return value of the expression, it begins a new transaction. Use the following built-in variables in the Expression Editor when you create a transaction control expression:
  • TC_CONTINUE_TRANSACTION: The Integration Service does not perform any transaction change for this row. This is the default value of the expression.
  • TC_COMMIT_BEFORE: The Integration Service commits the transaction, begins a new transaction, and writes the current row to the target. The current row is in the new transaction.
  • TC_COMMIT_AFTER: The Integration Service writes the current row to the target, commits the transaction, and begins a new transaction. The current row is in the committed transaction.
  • TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction, begins a new transaction, and writes the current row to the target. The current row is in the new transaction.
  • TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target, rolls back the transaction, and begins a new transaction. The current row is in the rolled back transaction.
Mapping Guidelines and Validation:
Use the following rules and guidelines when you create a mapping with a Transaction Control transformation:
  • If the mapping includes an XML target, and you choose to append or create a new document on commit, the input groups must receive data from the same transaction control point.
  • Transaction Control transformations connected to any target other than relational, XML, or dynamic MQSeries targets are ineffective for those targets.
  • You must connect each target instance to a Transaction Control transformation.
  • You can connect multiple targets to a single Transaction Control transformation.
  • You can connect only one effective Transaction Control transformation to a target.
  • You cannot place a Transaction Control transformation in a pipeline branch that starts with a Sequence Generator transformation.
  • If you use a dynamic Lookup transformation and a Transaction Control transformation in the same mapping, a rolled-back transaction might result in unsynchronized target data.
  • A Transaction Control transformation may be effective for one target and ineffective for another target. If each target is connected to an effective Transaction Control transformation, the mapping is valid.
  • Either all targets or none of the targets in the mapping should be connected to an effective Transaction Control transformation.
Example:
Step 1: Create a mapping with the following transformations
i)                    Source Definition /  Source Qualifier
ii)                   Transaction Control
iii)                 Target Definition


Note : Follow the steps to create the Transaction Control Transformation
  • In the Mapping Designer, click Transformation > Create. Select the Transaction Control transformation.
  • Enter a name for the transformation.
  • Enter a description for the transformation.
  • Click Create.
  • Click Done.
  • Drag the ports into the transformation.
  • Open the Edit Transformations dialog box, and select the Ports tab.
Select the Properties tab. Enter the transaction control expression that defines the commit and roll back behavior.

Go to the Properties tab and click on the down arrow to get in to the expression editor window. Later go to the Variables tab and Type IIF(KEY_COL=123,) select the below things from the built in functions.
IIF (KEY_COL=123,TC_COMMIT_BEFORE,TC_CONTINUE_TRANSACTION)
  • Connect all the columns from the transformation to the target table and save the mapping.
Click OK.

DB procedure for TYPE1 and TYPE2 mappings (SCD)

Type 2
====== 
CREATE OR REPLACE PROCEDURE HR.TYPE2(EMP_ID INTEGER,E_NAME CHAR,SAL REAL)
IS
EMP_NO INTEGER;
C_FLAG CHAR;
REC_CNT INTEGER;
BEGIN
SELECT 'Y' INTO C_FLAG FROM DUAL;
SELECT COUNT(*) INTO REC_CNT FROM TYPE1 WHERE EMPID=EMP_ID;
IF REC_CNT >=2 THEN
DELETE FROM TYPE1 WHERE EMPID=EMPID AND CURRENTFLAG='N';
COMMIT;
END IF;
SELECT EMPID INTO EMP_NO FROM TYPE1 WHERE EMPID=EMP_ID;
IF EMP_NO IS NOT NULL THEN
UPDATE TYPE1 SET CURRENTFLAG='N'
WHERE EMPID=EMP_ID;
END IF;
IF EMP_NO IS NOT NULL THEN
INSERT INTO TYPE1 VALUES(EMP_ID,E_NAME,SAL,C_FLAG);
END IF;
EXCEPTION
WHEN NO_DATA_FOUND THEN
INSERT INTO TYPE1 VALUES(EMP_ID,E_NAME,SAL,C_FLAG);
END
TYPE2;
/
 ~~~~~~
Type1
===== 
CREATE OR REPLACE PROCEDURE HR.TYPE_HIST(EMP_ID INTEGER,E_NAME VARCHAR,SAL REAL)
IS
EMP_NO INTEGER;
BEGIN
SELECT EMPID INTO EMP_NO FROM TYPE1 WHERE EMPID=EMP_ID;
IF EMP_NO IS NOT NULL THEN
UPDATE TYPE1 SET EMPID=EMP_ID,ENAME=E_NAME,SALARY=SAL WHERE EMPID=EMP_ID;
END IF;
EXCEPTION
WHEN NO_DATA_FOUND THEN
INSERT INTO TYPE1 VALUES(EMP_ID,E_NAME,SAL);
END
TYPE_HIST;
/