Aggregator Transformation

Aggregator Transformation 

a) Connected and Active Transformation
b) Allow us to perform aggregate calculations such as Avg. and Sum.
c) It allows us to perform calculations on groups.

Properties of Aggregator Transformation: -

1. Aggregate Expression
2. Group by port
3. Sorted Input
4. Aggregate cache

Below images shows Ports and Properties tab of Aggregator transformation.

i) Ports tab of Aggregator Transformation.


 ii) Properties tab of Aggregator Transformation

  Lets begin with very first Component of Aggregator Transformation 

 

1)Aggregate Expressions

(a)Entered in an output port.
(b)Can include non-aggregate expressions and conditional clauses.


(A) Aggregate Function : - The transformation language includes the following Aggregate functions:
    AVG, COUNT, MAX, MIN, SUM
    FIRST, LAST
    MEDIAN, PERCENTILE, STDDEV, VARIANCE

There are three variations of aggregate function
(i) Single Level Aggregate Function e.g. SUM(QUANTITY)
(ii) Nested Aggregate Function e.g. MAX(COUNT(ITEM))
(iii) Conditional Clause: - use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE  e.g. SUM( COMMISSION, COMMISSION > 100 )

While using Nested Aggregate function exercise below rules
1. You can include multiple single level functions or multiple nested functions in output port of an aggregator.
2. You cannot have both types of functions together.
3. If you need to create both singe level and nested functions create separate aggregator transformation.
       
(B)   Non-Aggregate Functions
        We can also use non-aggregate functions in the aggregate expression.
                             IIF ( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))

2) Group by Ports

(a) The Aggregator transformation lets you define groups for aggregations, rather than performing the aggregation across all input data.
(b) When grouping data, The Integration Service typically returns the last row of each group (or the last row received) with the result of the aggregation, unless otherwise specified.
For e.g. The following Aggregator transformation groups first by STORE_ID and then by ITEM

Aggregator Transformation Image

If you send the following data through this Aggregator transformation


STORE_ID
ITEM
QTY
PRICE
101
'battery'
3
2.99
101
'battery'
1
3.19
101
'battery'
2
2.59
101
'AAA'
2
2.45
201
'battery'
1
1.99
201
'battery'
4
1.59
301
'battery'
1
2.45

The Integration Service performs the aggregate calculation on the following unique groups


STORE_ID
ITEM
101
'battery'
101
'AAA'
201
'battery'
301
'battery'

The Integration Service then passes the last row received, along with the results of the aggregation, as follows


STORE_ID
ITEM
QTY
PRICE
SALE_PER_STORE
101
'battery'
2
2.59
17.34
101
'AAA'
2
2.45
4.9
201
'battery'
4
1.59
8.35
301
'battery'
1
2.45
2.45
 

3)    Using Sorted Input

 

(a) Use to improve session performance.
(b) To use sorted input, we must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.
(c) If the option is checked but we are not passing sorted data to the transformation, then the session fails.

Do not use Sorted Input if following conditions are true
: -
(i) The aggregate expression uses nested aggregate functions.
(ii) The session uses incremental aggregation.
(iii) Source data is data driven.

To use sorted input, you pass sorted data through the Aggregator. Data must be sorted in the following ways:
(i) By the Aggregator group by ports, in the order they appear in the Aggregator transformation.
(ii) Using the same sort order configured for the session. If data is not in strict ascending or descending order based on session sort order, , the Integration Service fails the session . For example, if you configure a session to use a French sort order, data passing into the Aggregator transformation must be sorted using the French sort order.

For relational and flat file sources, use the Sorter transformation to sort data in the mapping before passing it to the Aggregator transformation.
If the session uses relational sources, you can also use the Number of Sorted Ports option in the Source Qualifier transformation to sort group by columns in the source database.

(4)   Aggregator Index Cache & Data Cache:


(a) The Power Center Server stores data in the aggregate cache until it completes Aggregate calculations.
(b) It stores group values in an index cache and row data in the data cache. If the Power Center Server requires more space, it stores overflow values in cache files.

(A) Aggregator Index Cache: The index cache holds group information from the group by ports. If we are using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.

All Group by Columns are in AGGREGATOR INDEX CACHE. E.g. DEPTNO

(B) Aggregator Data Cache: DATA CACHE is generally larger than the AGGREGATOR INDEX CACHE.

Data Cache contains:
a) Variable ports if any
b) Non group by input/output ports.
c) Non group by input ports used in non-aggregate output expression.
d) Port containing aggregate function

Example of Aggregator Usage : - To calculate MAX, MIN, AVG and SUM of salary of EMP table.

Aggregator Transformation Mapping













Aggregator Mappinng Source, Target

1. EMP will be source table.
2. Create a target table EMP_AGG_EG in target designer. Table should contain DEPTNO, MAX_SAL, MIN_SAL, AVG_SAL and SUM_SAL
3. Create the shortcuts in your folder.

Creating Mapping:

1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give mapping name. e.g. m_aggregator
4. Drag EMP from source in mapping.
5. Click Transformation -> Create -> Select AGGREGATOR from list. Give name and click Create. Now click done.
6. Pass SAL and DEPTNO only from SQ_EMP to AGGREGATOR Transformation.
7. Edit AGGREGATOR Transformation. Go to Ports Tab
8. Create 4 output ports: OUT_MAX_SAL, OUT_MIN_SAL, OUT_AVG_SAL, and OUT_SUM_SAL
9. Open Expression Editor one by one for all output ports and give the calculations e.g. MAX(SAL), MIN(SAL), AVG(SAL),SUM(SAL)
10. Click Apply -> Ok.
11. Drag target table now.
12. Connect the output ports from Rank to target table.
13. Click Mapping -> Validate
14. Repository -> Save

Create Session and Workflow. Run the Workflow and see the data in target table


Dept_No
SAL
MAX_SAL
MIN_SAL
AVG_SAL
Sum_SAL
10
1300
5000
1300
2917
8750
20
3000
3000
800
2175
10875
30
950
2850
950
1567
9400

18 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Hi Thank you for the info. But i think you missed one point in this mapping. You didn't Check box for Group by port(deptno)...

    ReplyDelete
  3. Hi, Very nicely explained. Thank you.
    @Rizwan : i would take privilege to answer your question. Group by port deptno was not used because aggregation was done aiming whole data set not on inidvidual group of deptno.
    Thanks,
    SN

    ReplyDelete
  4. • Nice Blog It's such a useful information to all..
    tib co training in chennai

    ReplyDelete
  5. Very nice and deep overview. Thanks!

    ReplyDelete
  6. Hay bro it's a fantastic blog which gives all clear explanation.
    Thank you and request you to post remaining all.

    ReplyDelete
  7. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Informatica Powercenter 9.X Developer training, kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on Informatica Powercenter 9.X Developer training. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Nitesh Kumar
    MaxMunus
    E-mail: nitesh@maxmunus.com
    Skype id: nitesh_maxmunus
    Ph:(+91) 8553912023
    http://www.maxmunus.com/


    ReplyDelete
  8. very nice blog thanks for your information about informatica i just studyied this is wonderful informatica is one of the program enlarge advance software development.thanks a lot.
    Informatica Training in Chennai

    ReplyDelete
  9. informatica reference blog is simple but clear explanation .keep more updates.
    Informatica Training in Chennai


    ReplyDelete
  10. Hi Bro,

    Thanks so much for this article!
    I tried to follow some instructions
    from few other article and got in over my head. This worked so quickly and your instructions were very easy to follow. Really appreciate this.

    If you've added the certificate to ca-bundle.crt, that should all you need to update for SSL connection with webservice.

    When you mention 'application connection', are you referring to connections in a session? Can you send a screen shot of what you're talking about?



    Excellent tutorials - very easy to understand with all the details. I hope you will continue to provide more such tutorials.
    Many Thanks,
    Krishna

    ReplyDelete
  11. Hi There,


    Nice to be visiting your blog again, it has been months for me. Well this article that I've been waited for so long.
    Given this, I would be surprised to see anything on the test that was not covered in the modules, as it relates to the Cloud tool and topics covered about SalesForce. I would presume questions that test general ETL transformation knowledge based on past PowerCentre tool experience.

    But great job man, do keep posted with the new updates.


    Kind Regards,
    Jessy

    ReplyDelete
  12. Hi Bro,

    Thanks so much! This is a great time saver. You explained it very clearly:)

    Even if you change trust score in hub it will affect the existing
    record also. If you need to override trust score only for specific rowid
    then use Merge manager and override the trust score accordingly. Hope it
    answer your question if i understand correctly !

    By the way do you have any YouTube videos, would love to watch it. I would like to connect you on LinkedIn, great to have experts like you in my connection (In case, if you don’t have any issues).

    Please keep providing such valuable information.

    Obrigado,
    Irene Hynes

    ReplyDelete
  13. Hey Brother,

    Awesome article. Thanks for making that available. I've been using your help to build my own POC and will publish the steps in another blog soon.

    I agree with your logic but it might be that PT is considering the system columns also as normal columns and giving the functionality (just blind guess) .
    And yes I tried replicating your issue its same even for me.

    Are you using IDD to search records?

    Very useful article, if I run into challenges along the way, I will share them here.

    Thanks,
    Irene Hynes

    ReplyDelete
  14. Hi There,

    Very cool to see your post come up when I was searching this topic!

    If the pkey_src in stage exists in BO_XREF then it updates the XREF table and based on trust/recency the update to XREF will propagate to BO. Informatica MDM Training USA If the pkey_src doesn't exist in BO_XREF it inserts a row in XREF table and also inserts/creates a new BO with a new rowid_object (unless it is a load by rowid in which case XREF will be created but added to an existing BO record for which rowid is given in mapping and if it would survive to BO or not is dependent on trust/recency)

    But great job man, do keep posted with the new updates.
    Cheers,
    Irene Hynes

    ReplyDelete
  15. Really nice blog post.provided a helpful information.I hope that you will post more updates like this Informatica Online Course Bangalore


    ReplyDelete
  16. This is a nice article here with some useful tips for those who are not used-to comment that frequently. Thanks for this helpful information I agree with all points you have given to us. I will follow all of them.
    Data Science course in Chennai
    Data science course in bangalore
    Data science course in pune
    Data science online course
    Data Science Interview questions and answers
    Data Science Tutorial
    Data science course in bangalore

    ReplyDelete