Translate to your Language

Wednesday, December 26, 2012

What is ETL Mapping Document ?A Real Time Example

by Dinix C  |  in DW at  10:10 PM

One of the regular viewer of this blog requested me to explain the important's of the ETL mapping document.

What is ETL Mapping Document :
                               The ETL mapping document contains the source,target and business rules information's, this document will be the most important document for the ETL developer to design and develop the ETL jobs.

                             A typical mapping document should contain the following information's

1) Mapping indicator(Values A:ADD, D:Delete,C:Change)
2) Change description (Used to indicate mapping changes)
3) Key Indicator( Indicates whether the field is Primary key or not)
4) Source Table/File Name
5) Source Field Name
6) Source Field Data Type
7) Source Field Length
8) Source Field Description(The description will be used as a meta data for end user)
9)Business Rule
10)Target Table Name
11) Target Field Name
12) Target Data Type
13) Target Field Length
14) Comment

You can add more fields based on your requirement.

Real Time Example:
Let's take a look at a real time example here, consider that we are getting a source file called "Employee_info" which contains employee information's and it should get loaded into the EMP_DIM table in target. 

One of the business requirement is to convert Sales -->S,Packing-->P and Transport-->T


Row id Emp First Name Emp Last Name EMP No EMP Dep
1 John Beck 2001 Sales
2 Mike Morry 2002 Sales
3 Kevin Peter 2003 Packing
4 Steve Morry 2004 Packing
5 Jim Chen 2005 Transport


Here is how the mapping will look like



Once the mapping is prepared, it needs to be reviewed with end user and get the sign off from the them.

Lets say that during the review they users want to add "E" before the EMP_NUM field in target.
The mapping document should be changed first, here it's how it will look after the change



 As you can see in the image that the "Mapping Indicator" column got changed from A to C and "Change description" contains change details.

7 comments:

  1. how to identify the facts and dimensions

    ReplyDelete
    Replies
    1. Facts contain measures (quantitative information). Dimensions contain descriptive (qualitative) information.

      Delete
  2. how to identify the fact and dimension tables?

    ReplyDelete
  3. The article is informative especially on the breakdown over some of the typical while important elements of a ETL mapping document.

    Most ETL mapping are written in Excel spreadsheets, do you find it effective, or it is used simply because there isn’t any alternate productive tool?

    I often found data professionals frustrated choosing the following when it comes to data mapping exercise:
    1) Excel/Word (not scalable – painful in version controls, scattered copies and team collaboration, etc.); or
    2) Heavy & costly tool-set (etl/migration execution focused instead & big learning curve)

    Much appreciated if you can share some of your thoughts!

    ReplyDelete
    Replies
    1. I have experienced both :). IMHO Excel is the best solution, it is convertable to database, back and forth. Inside of a DB, it si not difficult to track logical data lineage and support both versioning and team collaboration. Now I work for another DWH, all the source target mapping are literally described in Power Designer Annotation at target table level, similar to a cooking book. It is nother well-arranged nor efficient. Now I have to convince people that think that this is the world's best solution to accept the first approach and evaluate the conversion.

      Delete
  4. Tony,
    In my experience, I have seen that most of the Business analyst's are having very good hands on in all Microsoft products and they feel very comfortable in working on it then any other tools, so excel becomes the number one tool they prefer for any mapping documents.Having said that, we do have some external third part tool available, but you are talking about additional cost and learning curve.

    The Tool selection is completely based on our requirement,volume of data and skill availability,I have recently seen many companies are moving towards opensource system due cost and flexibility to change.

    ReplyDelete

© Copyright © 2015Big Data - DW & BI. by