Optimizing the Source
- Optimizing the Query: Single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes
- Using Conditional Filters: if multiple sessions read from the same source simultaneously, the Powercenter conditional filter may improve performance
- Increasing Database Network Packet Size in Oracle (listener.ora and tnsnames.ora)
- Connecting to Oracle Database Sources: Use IPC protocol to connect to the Oracle database
- Using Teradata FastExport (use FastExport reader instead of Relational reader)
- In Sybase or Microsoft SQL Server Tables, use tempdb as an in-memory database to allocate sufficient memory
1.1.1 Optimizing Mappings Overview
- Reduce the number of transformations in the mapping
- Delete unnecessary links between transformations
- Use Expression Transformations to do the most amount of work possible
1.1.2 Optimizing Datatype Conversions
- Use integer values in place of other datatypes when performing comparisons using Lookup and Filter transformations
- Convert the source dates to strings through port-to-port conversions to increase session performance
1.1.3 Optimizing Expressions
- When possible, isolate slow expressions and simplify them
- Choose Numeric fields in place of strings as much as possible
- Choose Decode instead of Lookup as much as possible
- Use Operators instead of Functions
1.1.4 Optimizing Aggregator Transformations
- Group by simple columns
- Use sorted input
- Use incremental aggregation
- Filter data before you aggregate it
- Limit port connections
1.1.5 Optimizing Custom Transformations
- Decrease the number of function calls the Integration Service and procedure make
- Increase the locality of memory access space for the data
- Write procedure code to perform an algorithm on a block of data instead of each row of data
1.1.6 Optimizing Joiner Transformations
- Designate the master source as the source with fewer duplicate key values
- Designate the master source as the source with fewer rows
- Perform joins in a database when possible
- Join sorted data when possible
1.1.7 Optimizing Lookup Transformations
- Use the optimal database driver
- Cache lookup tables
- Optimize the lookup condition
- Filter lookup rows
- Index the lookup table
- Optimize multiple lookups
- Create a pipeline Lookup transformation and configure partitions in the pipeline that builds the lookup source
1.1.8 Optimizing Normalizer Transformation
- Place the Normalizer Transformation as close to the target as possible
1.1.9 Optimizing Sequence Generator Transformations
- Create a reusable Sequence Generator and using it in multiple mappings simultaneously
- Consider configuring the Number of Cached Values to a value greater than 1,000
- If you do not have to cache values, set the Number of Cache Values to 0
1.1.10 Optimizing Sorter Transformations
- Allocate enough memory to sort the data.
- Specify a different work directory for each partition in the Sorter transformation
- Run the Powercenter Integration Service in ASCII mode
1.1.11 Optimizing Source Qualifier Transformations
- Use the Select Distinct option for the Source Qualifier transformation if you want the Integration Service to select unique values from a source
1.1.12 Optimizing SQL Transformations
- SQL Transformations used in Query mode instead of Script Mode
1.1.13 Eliminating Transformation Errors
- Set Error Threshold to an appropriate value
- If there are multiple errors in a session, set tracing level to low (ex-Terse)
- Check Session log to check Transformation Errors
1.1.14 Optimize the Sessions
- Use a Grid to improve session / workflow performance
- Use Pushdown Optimization
- Run Concurrent Sessions and Workflows
- Buffer Memory: Adjust DTM Buffer Size & Default Buffer Block Size
- Caches: Analyze Transformation Caches & configure them accordingly
- Target-Based Commit: Choose appropriately
- Real-time Processing: See if Source Based Commit can be used
- Remove Staging Areas as much as possible
- Generate less or binary Log Files
- Choose Appropriate Error Tracing
- Post-Session Emails: Set flat file Logging for Post Session Emails
1.1.15 Optimize Grid Deployment
- Add nodes to the grid
- Increase storage capacity and bandwidth
- Use shared file systems
- Use a high-throughput network
1.1.16 Optimizing the Powercenter Components
Optimizing Powercenter Repository Performance:
- Ensure the Powercenter repository is on the same machine as the Repository Service process
- Order conditions in object queries
- Use a single-node tablespace for the Powercenter repository if you install it on a DB2 db
- Optimize the database schema for the Powercenter repository if you install it on a DB2 or Microsoft SQL Server database
Optimizing Integration Service Performance:
- Use native drivers instead of ODBC drivers for the Integration Service
- Run Int Service in ASCII data movement mode if character data is 7-bit ASCII or EBCDIC
- Cache Powercenter metadata for the Repository Service
- Run Integration Service with high availability
1.1.17 Optimize the Systems
- Improve network speed
- Use multiple CPUs
- Reduce paging
- Use processor binding
- Use Local Disk if possible: A local disk can move data 5 to 20 times faster than a network
- Minimize the number of network hops between the source and target databases and the Integration
- Move Target to a Server System if possible
- Ask Network Engineer to provide enough Bandwidth
- Use Multiple CPUs
- Reduce Paging
- Use Processor Binding
- Are the target tables having constraints?
- Does a parent need to be loaded first? Check for PK – FK Relationship
- Naming convention for mapping , workflow, session, transformations
- SCD type 1 / 2 followed
- Check if DTM buffer size is auto or manually set
- Use Decode instead of nested IIF functions
- Use of Variables or Parameters ($Source / $ DBConnection_Src) for Connection of Source / Target Tables
- Use of Unconnected Lookups wherever only 1 value is to be returned by Lookup Table
- Using Dynamic Lookups for SCD implementations
- Use of Shortcuts in case of multiple folders
- Re-usable transformations
- Memory Utilization – Source Bottleneck (can be checked only by execution of mapping)
- Use of Re-usable sessions
- Memory Utilization – Target Bottleneck (can be checked only by execution of mapping)
- Memory Utilization – Transformation Bottleneck (can be checked only by execution of mapping)
- Different Commit Type & Commit Interval (from Session Properties, default is Target Type , 10000)
- Column Precision Too high / Implicit Data type Conversion