Tuesday, April 28, 2015

ETL or ELT? Conversation I had with a participant

Everyone is familiar with the term "ETL" and everyone knows about the processes related to it. This concept refers extracting data from one or more heterogeneous sources, transforms them into a structure that destination can accept while applying all possible validations and business logic, and loads the destination. Is this always ETL? Can I refer this as ELT too? I referred this process as ELT in one of my recent presentations while describing Big Data and Microsoft APS (Analytics Platform System), and of course I was questioned by one of participants. Here is the conversation;

Participant: Hey Dinesh, it is a great session, learnt a lot from it, I need a clarification on an area you were discussing, have a minute?

Me: Thanks, sure, what is the area?

Participant: I have been implementing BI solutions for last 4 years and have done many different implementations on ETL side. During your presentation, you mentioned that the ETL term as ELT, not once twice, so, is ELT a known term or you just referred it?

Me: Yes, I referred it, and your first question, whether it is a known and standard term, I am 100% sure that it is a known term but I do not know whether it is a standard term. But I believe that the process can be referred as ELT based on the way it has been implemented.

Participant: Have you seen or heard this term before? Or you just created it :) ?

Me: :), I did not, this is being referred by many experts, I have seen it with videos and articles too. It is not an uncommon one but I think it is bit new to the industry, or it is not, but no one bothered to refer the process as ELT.

Participant: What this ELT exactly referring? Is it all about transforming loaded data violating the traditional rule?

Me: Yes it is, it refers applying transformation after loading your destination but it does not mean that we are violating any rules or standards. Think about loading data into staging environment. We load data and do some validation, apply some business logic to loaded data. If we just isolate "loading of staging", considering the staging as the destination, what we have done is ELT, not ETL.

Participant: Yes, I can understand, I have done the same too. How this involves with Big Data?

Me: I was explaining about Microsoft Analytics Platform System. With PDW (Parallel Data Warehouse Edition) or with standard Hadoop environment, we distribute the process into multiple nodes for getting the process done quickly. So, if the data warehouse is implemented with PDW, means millions of records, we can load data into distributed table in PDW database as the first step and then process them, applying complex business logic rather than doing them during the loading phase. This speeds up the whole ETL process. This is not extract, transform and loading, this is extract, loading and transform. And we can simply refer this as ELT.

Participant: Great, good explanation. Now it is clear. Thanks Dinesh 

Me: Thank you too, I always prefer to hear questions, it improves my knowledge and encourages me to do more exploration.


No comments: