I am not so expert to teach or preach you but sharing some steps which can help you to easy start with any ETL tool, These steps can be differ from person to person and their understanding or Knowledge :-)
ETL -- Extract, Transform and Load
Step 1: Learn Basics of SQLs
This step is very crucial, You should have understanding of RDBMS concept and hands on SQL basics, so that, you can correlate with ETL functionality. You should be aware with the fact that there is nothing which can be done in ETL but not in SQL one way or other.
Step 2: Understanding of ETL tool architecture
This is the step where we got confuse and start learning or studying the Installation of tool and Administration. Actually, we are trying to be Sachin Tendulkar (a cricket reference) without having any knowledge of Batting :-) . So don't do that.
ETL Architecture means - understanding of tools type and its functionality, how they communicate with each other, how the request flows from one tool client to another. Try to get basics, If you feel little uncomfortable with it, no worries, everyone feels the same way when they start learning how to walk.
Step 3: Understanding of Tool functionality
We will now try to go deep down, basically, with every ETL tool, there are 3 tools/clients available (can be more) - Admin Client, Designing Client, Monitoring Client. Pick the last 2 first - Designing and Monitoring if want to be an ETL developer and pick first and last if going for admin. You can find huge database on ETL tools helping docs over internet.
I am not going to discuss Admin or Monitoring but Designer client as this is the place where you are going to fight most of your battle.
Designer tool - Learning basics of E(extract) and L(load) part is quite easy, so skipping these, we should put our focus on T(transformation) part, like - What are the stages/functions provided by the tool to manipulate the data? Can we relate them with SQL functions?
Try to design very simple design/jobs which is simply extracting the data from DB or flat file, doing some basic manipulations/transformation on data and loading that data to a flat file. Don't try to design a very complex or smart job in first go but if you think you can, go ahead, who am I to hold you.
Step 4: How to design a job/flow
A lot of preaching till now, now moving to action. Next, what I am going to share may or may not work for you, this is just an idea which might be useful. I am directly jumping on designing part.
As soon as, I get the requirement, I tend to analyze what are my sources/targets and what transformations are required, after than I pseudo-code them on a paper or in mind ;-). Next, start with the same on ETL designer tool, add source and target without adding any transformation in between and voila!! you just developed your first job.
Now, relate the SQL functions with ETL tool component/stage and with the ask. Suppose, you have to get the max salary from each department to get the highest paid employee, you have to think, how you have to achieve the same in SQL (need to work with group by clause on department and use max function on salary) and now same you have to perform on ETL, check which component/stage provide you the same functionality and which kind of input data it is asking for. Now, it's time to align the input data as expected by transformation component/stage and link with the output. You need to continue the same till you get your desired result, it might be possible, it might be possible, in starting, you feel fuzzy or confused but as you design, you grow your knowledge in that area.
As I told, there is nothing in ETL which can't be done in SQLs. So relate the tools functionality with SQL term, this will help you to remember the stage functions. If I can say about DataStage, this is basic functionality provide by stage.
Please let me know what do you think in comment sections.
SQL | DataStage |
Group By | Aggregator |
Where | Filter |
union all | Funnel |
join | join, lookup, merge |
surrogate key | surrogate key generator |
dummy column | column generator |
top rows | head |
botton rows | tail |
order by | sort |
case stmt | switch |
Please let me know what do you think in comment sections.
Till then....Happy Learning !!!
No comments:
Post a Comment