3.22 Optimizing the Source and other Transformations

Optimizing the Source

  • Optimizing the Query: Single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes
  • Using Conditional Filters: if multiple sessions read from the same source simultaneously, the Powercenter conditional filter may improve performance
  • Increasing Database Network Packet Size in Oracle (listener.ora and tnsnames.ora)
  • Connecting to Oracle Database Sources: Use IPC protocol to connect to the Oracle database
  • Using Teradata FastExport (use FastExport reader instead of Relational reader)
  • In Sybase or Microsoft SQL Server Tables, use tempdb as an in-memory database to allocate sufficient memory

1.1.1 Optimizing Mappings Overview

  • Reduce the number of transformations in the mapping
  • Delete unnecessary links between transformations
  • Use Expression Transformations to do the most amount of work possible

1.1.2 Optimizing Datatype Conversions

  • Use integer values in place of other datatypes when performing comparisons using Lookup and Filter transformations
  • Convert the source dates to strings through port-to-port conversions to increase session performance

1.1.3 Optimizing Expressions

 

  • When possible, isolate slow expressions and simplify them
  • Choose Numeric fields in place of strings as much as possible
  • Choose Decode instead of Lookup as much as possible
  • Use Operators instead of Functions

1.1.4 Optimizing Aggregator Transformations

  • Group by simple columns
  • Use sorted input
  • Use incremental aggregation
  • Filter data before you aggregate it
  • Limit port connections

1.1.5 Optimizing Custom Transformations

  • Decrease the number of function calls the Integration Service and procedure make
  • Increase the locality of memory access space for the data
  • Write procedure code to perform an algorithm on a block of data instead of each row of data

1.1.6 Optimizing Joiner Transformations

  • Designate the master source as the source with fewer duplicate key values
  • Designate the master source as the source with fewer rows
  • Perform joins in a database when possible
  • Join sorted data when possible

 

1.1.7 Optimizing Lookup Transformations

  • Use the optimal database driver
  • Cache lookup tables
  • Optimize the lookup condition
  • Filter lookup rows
  • Index the lookup table
  • Optimize multiple lookups
  • Create a pipeline Lookup transformation and configure partitions in the pipeline that builds the lookup source

1.1.8 Optimizing Normalizer Transformation

  • Place the Normalizer Transformation as close to the target as possible

1.1.9 Optimizing Sequence Generator Transformations

  • Create a reusable Sequence Generator and using it in multiple mappings simultaneously
  • Consider configuring the Number of Cached Values to a value greater than 1,000
  • If you do not have to cache values, set the Number of Cache Values to 0

1.1.10 Optimizing Sorter Transformations

  • Allocate enough memory to sort the data.
  • Specify a different work directory for each partition in the Sorter transformation
  • Run the Powercenter Integration Service in ASCII mode

1.1.11 Optimizing Source Qualifier Transformations

  • Use the Select Distinct option for the Source Qualifier transformation if you want the Integration Service to select unique values from a source

1.1.12 Optimizing SQL Transformations

  • SQL Transformations used in Query mode instead of Script Mode

1.1.13 Eliminating Transformation Errors

  • Set Error Threshold to an appropriate value
  • If there are multiple errors in a session, set tracing level to low (ex-Terse)
  • Check Session log to check Transformation Errors

1.1.14 Optimize the Sessions

  • Use a Grid to improve session / workflow performance
  • Use Pushdown Optimization
  • Run Concurrent Sessions and Workflows
  • Buffer Memory: Adjust DTM Buffer Size & Default Buffer Block Size
  • Caches: Analyze Transformation Caches & configure them accordingly
  • Target-Based Commit: Choose appropriately
  • Real-time Processing: See if Source Based Commit can be used
  • Remove Staging Areas as much as possible
  • Generate less or binary Log Files
  • Choose Appropriate Error Tracing
  • Post-Session Emails: Set flat file Logging for Post Session Emails

1.1.15 Optimize Grid Deployment

  • Add nodes to the grid
  • Increase storage capacity and bandwidth
  • Use shared file systems
  • Use a high-throughput network

1.1.16 Optimizing the Powercenter Components

Optimizing Powercenter Repository Performance:

  • Ensure the Powercenter repository is on the same machine as the Repository Service process
  • Order conditions in object queries
  • Use a single-node tablespace for the Powercenter repository if you install it on a DB2 db
  • Optimize the database schema for the Powercenter repository if you install it on a DB2 or Microsoft SQL Server database

Optimizing Integration Service Performance:

  • Use native drivers instead of ODBC drivers for the Integration Service
  • Run Int Service in ASCII data movement mode if character data is 7-bit ASCII or EBCDIC
  • Cache Powercenter metadata for the Repository Service
  • Run Integration Service with high availability

1.1.17 Optimize the Systems

  • Improve network speed
  • Use multiple CPUs
  • Reduce paging
  • Use processor binding
  • Use Local Disk if possible: A local disk can move data 5 to 20 times faster than a network
  • Minimize the number of network hops between the source and target databases and the Integration
  • Move Target to a Server System if possible
  • Ask Network Engineer to provide enough Bandwidth
  • Use Multiple CPUs
  • Reduce Paging
  • Use Processor Binding

 

  • Are the target tables having constraints?
  • Does a parent need to be loaded first? Check for PK – FK Relationship
  • Naming convention for mapping , workflow, session, transformations
  • SCD type 1 / 2 followed
  • Check if DTM buffer size is auto or manually set
  • Use Decode instead of nested IIF functions
  • Use of Variables or Parameters ($Source / $ DBConnection_Src) for Connection of Source / Target Tables
  • Use of Unconnected Lookups wherever only 1 value is to be returned by Lookup Table
  • Using Dynamic Lookups for SCD implementations
  • Use of Shortcuts in case of multiple folders
  • Re-usable transformations
  • Memory Utilization – Source Bottleneck (can be checked only by execution of mapping)
  • Use of Re-usable sessions
  • Memory Utilization – Target Bottleneck (can be checked only by execution of mapping)
  • Memory Utilization – Transformation Bottleneck (can be checked only by execution of mapping)
  • Different Commit Type & Commit Interval (from Session Properties, default is Target Type , 10000)
  • Column Precision Too high / Implicit Data type Conversion