Thursday, December 17, 2015

What is Data Wrangling? Is it same as ETLing?

While I was reading a newsletter received from Hortonworks, I noticed an article related to data, titled with Data Wrangling. With my experience, dozens of business intelligence implementations, though I have worked with hetorogenious data sets, I have never used this term, even when discussing ETLing. Asked few, this is an unknown to many, hence thought to make this post, just discussing how I see the Data Wrangling and how I see it differently comapring well known ETLing.

ETL, Extract, Transform and Loading, is a common technique we use with Data Integration (DI), Data Warehousing and Business Intelligence. This is more on structured data with well know data sources and mostly with automated tools. This extracts data from various, scattered systems, and prepares data as rich-consumable and loads to the destination, specifically data warehouse. Data Wrangling does the same but few differences.

[image taken from: http://www.datawatch.com/what-is-data-wrangling/]

Data Wrangling works more on unorganized, unstructured, large data set rather a set of structured data. This talks about a manual process that coverts data from one raw form to another format which is more readable and organized for analyzing data. As per the articles read, the term was introduced by Trifacta that offers a tool to help on this process. More on this, the person who does this process is called as Data Wrangler.


No comments:

Post a Comment