How to Identify Throughput Issues
Ideally, how do you determine if your performance issues is due to
read or write throughput of your session? Below are few typical issues that
result in poor performance (low throughput values)
read or write throughput of your session? Below are few typical issues that
result in poor performance (low throughput values)
Target
Commit Point is too small
Commit Point is too small
The default target commit point of 10000 is used when processing
large data volumes. This results in commits every 10000 rows even if the
mapping is processing millions or hundreds of millions of rows. Setting a
commit point of 100,000 or even 1 million rows is not unreasonable as data
volumes grow and grow. There is a point where roll-back and commit points
should be balanced, but if processing 40 million rows, a commit point of under
500,000 is probably too small.
large data volumes. This results in commits every 10000 rows even if the
mapping is processing millions or hundreds of millions of rows. Setting a
commit point of 100,000 or even 1 million rows is not unreasonable as data
volumes grow and grow. There is a point where roll-back and commit points
should be balanced, but if processing 40 million rows, a commit point of under
500,000 is probably too small.
Auto
Memory settings are too small
Memory settings are too small
Adding memory to the session is the single item with the biggest
impact if you see throughput values in the Workflow Monitor (or in the query
results) that are relatively small in relation to the number of rows being
processed. Testing is needed to find the sweet spot of adding more memory and
not adding memory when it no longer helps, and also adding memory while keeping
in mind the resource requirements of other jobs running concurrently. Usually
adjusting to 1 GB or 2 GB of Auto Memory can make a difference without being
out of line. Be sure to balance the Maximum value for the suto memory with the
Maximum Percentage of available memory since it is the smaller of the two that
is used.
impact if you see throughput values in the Workflow Monitor (or in the query
results) that are relatively small in relation to the number of rows being
processed. Testing is needed to find the sweet spot of adding more memory and
not adding memory when it no longer helps, and also adding memory while keeping
in mind the resource requirements of other jobs running concurrently. Usually
adjusting to 1 GB or 2 GB of Auto Memory can make a difference without being
out of line. Be sure to balance the Maximum value for the suto memory with the
Maximum Percentage of available memory since it is the smaller of the two that
is used.
Multi-pass
Sorts
Sorts
Review the session log for messages about multi-pass sorts. This
is an indication that more memory is required. The sorter transformations
default to Auto for the sorter cache size, and in the past defaulted to 200 for
this value. See the message in the log and set the Auto Memory or explicitly
set the sorter cache to a larger value so that the sort can be performed in 1
or 2 passes. Note that the Auto Memory setting for sorters and other
transformations that cache data is a shared value across each transformation.
You may need to adjust the Auto Memory setting to make it an appropriate value.
is an indication that more memory is required. The sorter transformations
default to Auto for the sorter cache size, and in the past defaulted to 200 for
this value. See the message in the log and set the Auto Memory or explicitly
set the sorter cache to a larger value so that the sort can be performed in 1
or 2 passes. Note that the Auto Memory setting for sorters and other
transformations that cache data is a shared value across each transformation.
You may need to adjust the Auto Memory setting to make it an appropriate value.
Not
Sorting before Aggregators and Joiners
Sorting before Aggregators and Joiners
Review the session log for messages about nested-loop algorithms
for Aggregators and Joiners. These are indications that you did not sort before
them, or you did not mark the transformation as having sorted input. Sorting
does take some cycles, but the sort-merge algorithm is much faster and makes
the other cache files much smaller, and so is worth the effort to improve
performance.
for Aggregators and Joiners. These are indications that you did not sort before
them, or you did not mark the transformation as having sorted input. Sorting
does take some cycles, but the sort-merge algorithm is much faster and makes
the other cache files much smaller, and so is worth the effort to improve
performance.
Complex
SQL Overrides
SQL Overrides
Using SQL Overrides pushes the logic to the database when
initially querying the data or in lookups, etc. This is not a best practice for
a number of reasons:
initially querying the data or in lookups, etc. This is not a best practice for
a number of reasons:
a. It hides the
metadata of the mapping
metadata of the mapping
b. It hides
transformation logic being done on the data
transformation logic being done on the data
c. It does not
allow the Informatica server to take some of the load of the transformation of
data
allow the Informatica server to take some of the load of the transformation of
data
It is a good practice to take the SQL that will be run against the
database from the session log and to run an explain plan and tune it
appropriately in the database by adding the right indexes, etc., that assist in
the query performance.
database from the session log and to run an explain plan and tune it
appropriately in the database by adding the right indexes, etc., that assist in
the query performance.
Using a
Lookup instead of a Join
Lookup instead of a Join
Both Lookups and Joins can be used to reference data in another
table based on data in the main source data and specified logic. Lookup results
are cached either in memory, if they are small enough, or to a file on disk. If
the results are large, then it does not make sense to use the lookup and is
likely more efficient to use a Join instead. This can be a join performed in a
Source Qualifier if it is coming from the same instance as the source table, or
with a Joiner transformation if they are disparate database sources.
table based on data in the main source data and specified logic. Lookup results
are cached either in memory, if they are small enough, or to a file on disk. If
the results are large, then it does not make sense to use the lookup and is
likely more efficient to use a Join instead. This can be a join performed in a
Source Qualifier if it is coming from the same instance as the source table, or
with a Joiner transformation if they are disparate database sources.
It was observed that there are mappings that were caching lookup
results for 40 million or more rows. This is a clear indication that a joiner
should have been used instead. Generally, for any table with more than a
million rows it is better to use a Join instead of a Lookup. The items marked
in the session log below are lookups that should become joins instead because
of the number of rows returned.
results for 40 million or more rows. This is a clear indication that a joiner
should have been used instead. Generally, for any table with more than a
million rows it is better to use a Join instead of a Lookup. The items marked
in the session log below are lookups that should become joins instead because
of the number of rows returned.
Unsorted
Input in a Lookup
Input in a Lookup
The session log might indicate messages with error code TT_11195
regarding unsorted input for a lookup transformation.
regarding unsorted input for a lookup transformation.
This occurs because the SQL Override eliminated the sort, the the
ports in the lookup transformation are not the same datatype and exact
precision as the definition in the database, or there is binary data as part of
the lookup results and the sort order is ascii and not binary in the mapping.
ports in the lookup transformation are not the same datatype and exact
precision as the definition in the database, or there is binary data as part of
the lookup results and the sort order is ascii and not binary in the mapping.
Some of the items with poor throughput were flagged and we reviewed those mappings, sessions, and session logs
to look at Auto Memory Settings, DTM buffer Size settings, session log erros
and bottlenecks, logic in the mapping regarding SQL Overrides, Lookup size,
sorting before Aggregator and Joins, etc. The candidates for this review were
selected because, the execution time was longer than 1 hour, the number of rows
per second of throughput was less than 10000 while the number of rows processed
was greater than 10000. This was a starting point to try to find some of the worst
jobs and review them. We did not get input from the development or production
support teams to guide our efforts.
to look at Auto Memory Settings, DTM buffer Size settings, session log erros
and bottlenecks, logic in the mapping regarding SQL Overrides, Lookup size,
sorting before Aggregator and Joins, etc. The candidates for this review were
selected because, the execution time was longer than 1 hour, the number of rows
per second of throughput was less than 10000 while the number of rows processed
was greater than 10000. This was a starting point to try to find some of the worst
jobs and review them. We did not get input from the development or production
support teams to guide our efforts.
List "WRITE" Throughput of the session
SELECT l.subject_area,
workflow_name,
l.session_name,
l.table_name,
l.successful_rows,
l.successful_affected_rows,
l.throughput AS write_thruput,
l.start_time,
l.end_time,
ROUND ( (l.end_time - l.START_time) *
24 * 60 * 60) Run_Time
FROM
rep_sess_tbl_log l, rep_workflows w
WHERE
l.start_time > TO_DATE ('20181101', 'YYYYMMDD')
AND last_error_code IN (0, 7004) -- last one without an error
AND (l.start_time, l.session_name) IN
( SELECT MAX (start_time), -- latest run date -
uncomment to get all
session_name
FROM rep_sess_tbl_log
GROUP BY session_name)
AND l.successful_rows > 0
AND l.workflow_id = w.workflow_id
ORDER
BY l.subject_area,
w.workflow_name,
l.session_name,
l.start_time
List "READ" Throughput of the session
SELECT t.task_name AS workflow_name,
ti.instance_name AS session_name,
swl.instance_name AS table_name,
swl.applied_rows,
swl.affected_rows,
swl.thruput AS read_thruput,
swl.start_time,
swl.end_time,
ROUND ( (swl.end_time -
swl.START_time) * 24 * 60 * 60) Run_Time
FROM opb_swidginst_log swl, opb_task_inst
ti, opb_task t
WHERE
swl.last_err_code IN (0, 7004)
AND (swl.start_time,
swl.instance_name) IN
( SELECT MAX (start_time), instance_name
FROM opb_swidginst_log
GROUP BY instance_name)
AND swl.session_id = ti.task_id
AND TI.WORKFLOW_ID = T.TASK_ID
AND (applied_rows > 0 OR
affected_rows > 0)
GROUP
BY t.task_name,
ti.instance_name,
swl.instance_name,
swl.applied_rows,
swl.affected_rows,
swl.thruput,
swl.start_time,
swl.end_time,
ROUND ( (swl.end_time -
swl.START_time) * 24 * 60 * 60)
ORDER
BY t.task_name,
ti.instance_name,
swl.instance_name,
swl.start_time
Average
Run Time and Throughput
SELECT l.subject_area,
w.workflow_name,
l.session_name,
l.table_name,
AVG (l.throughput) AS write_thruput,
AVG (ROUND ( (l.end_time -
l.START_time) * 24 * 60 * 60)) Run_Time,
AVG (l.successful_rows)
AS avg_success_rows,
AVG (l.successful_affected_rows)
AS avg_affected_rows
FROM rep_sess_tbl_log l, rep_workflows w
WHERE
l.start_time > TO_DATE ('20180210', 'YYYYMMDD')
--and subject_area='FOLDERName'
AND l.workflow_id = w.workflow_id
GROUP BY l.subject_area,
w.workflow_name,
l.session_name,
l.table_name
ORDER BY l.subject_area,
w.workflow_name,
l.session_name,
l.table_name