Software Technology FAQs and Books

Here you can find information on information technology, software techology related faqs and books, documentation

2.10.05

tuning mappings -2

Optimize Lookup Transformations
There are a number of ways to optimize lookup transformations that are setup in a mapping.
When to Cache Lookups
When caching is enabled, the PowerCenter Server caches the lookup table and queries the lookup cache during the session. When this option is not enabled, the PowerCenter Server queries the lookup table on a row-by-row basis.

Sharing Lookup Caches
There are a number of methods for sharing lookup caches.
• Within a specific session run for a mapping, if the same lookup is used multiple times in a mapping, the PowerCenter Server will re-use the cache for the multiple instances of the lookup. Using the same lookup multiple times in the mapping will be more resource intensive with each successive instance. If multiple cached lookups are from the same table but are expected to return different columns of data, it may be better to setup the multiple lookups to bring back the same columns even though not all return ports are used in all lookups. Bringing back a common set of columns may reduce the number of disk reads.
• Across sessions of the same mapping, the use of an unnamed persistent cache allows multiple runs to use an existing cache file stored on the PowerCenter Server. If the option of creating a persistent cache is set in the lookup properties, the memory cache created for the lookup during the initial run is saved to the PowerCenter Server. This can improve performance because the Server builds the memory cache from cache files instead of the database. This feature should only be used when the lookup table is not expected to change between session runs.
• Across different mappings and sessions, the use of a named persistent cache allows sharing of an existing cache file.
Reducing the Number of Cached Rows
There is an option to use a SQL override in the creation of a lookup cache. Options can be added to the WHERE clause to reduce the set of records included in the resulting cache.
NOTE: If you use a SQL override in a lookup, the lookup must be cached.
Optimizing the Lookup Condition
In the case where a lookup uses more than one lookup condition, set the conditions with an equal sign first in order to optimize lookup performance.
Indexing the Lookup Table
The PowerCenter Server must query, sort and compare values in the lookup condition columns. As a result, indexes on the database table should include every column used in a lookup condition. This can improve performance for both cached and un-cached lookups.
¨ In the case of a cached lookup, an ORDER BY condition is issued in the SQL statement used to create the cache. Columns used in the ORDER BY condition should be indexed. The session log will contain the ORDER BY statement.
¨ In the case of an un-cached lookup, since a SQL statement created for each row passing into the lookup transformation, performance can be helped by indexing columns in the lookup condition.
Optimize Filter and Router Transformations
Filtering data as early as possible in the data flow improves the efficiency of a mapping. Instead of using a Filter Transformation to remove a sizeable number of rows in the middle or end of a mapping, use a filter on the Source Qualifier or a Filter Transformation immediately after the source qualifier to improve performance.
Avoid complex expressions when creating the filter condition. Filter transformations are most effective when a simple integer or TRUE/FALSE expression is used in the filter condition.
Filters or routers should also be used to drop rejected rows from an Update Strategy transformation if rejected rows do not need to be saved.
Replace multiple filter transformations with a router transformation. This reduces the number of transformations in the mapping and makes the mapping easier to follow.
Optimize Aggregator Transformations
Aggregator Transformations often slow performance because they must group data before processing it.
Use simple columns in the group by condition to make the Aggregator Transformation more efficient. When possible, use numbers instead of strings or dates in the GROUP BY columns. Also avoid complex expressions in the Aggregator expressions, especially in GROUP BY ports.
Use the Sorted Input option in the aggregator. This option requires that data sent to the aggregator be sorted in the order in which the ports are used in the aggregators group by. The Sorted Input option decreases the use of aggregate caches. When it is used, the PowerCenter Server assumes all data is sorted by group and, as a group is passed through an aggregator, calculations can be performed and information passed on to the next transformation. Without sorted input, the Server must wait for all rows of data before processing aggregate calculations. Use of the Sorted Inputs option is usually accompanied by a Source Qualifier which uses the Number of Sorted Ports option.
Use an Expression and Update Strategy instead of an Aggregator Transformation. This technique can only be used if the source data can be sorted. Further, using this option assumes that a mapping is using an Aggregator with Sorted Input option. In the Expression Transformation, the use of variable ports is required to hold data from the previous row of data processed. The premise is to use the previous row of data to determine whether the current row is a part of the current group or is the beginning of a new group. Thus, if the row is a part of the current group, then its data would be used to continue calculating the current group function. An Update Strategy Transformation would follow the Expression Transformation and set the first row of a new group to insert and the following rows to update.