Tuesday, June 16, 2015

Different query patterns you see with HiveQL when comparing with TSQL

In recent past......... a casual conversation turned to a technical conversation......

Friend: It is a surprise to see an MVP studying hard on Open Source, anyway good see you in the club :).

Me: This does not mean that I will be completely moving to Open Source, but I like to lean some additional things and get involved with a good implementation.

Friend: Okay, you have been studying HIVE for last few weeks, tell me three types of queries you have not seen with SQL Server?

Me: Yes, have seen few, let me remind three, you can interchange blocks in SELECT like FROM clause before SELECT, one CREATE TABLE statement for creating table and loading data, and of course duplicating table with CREATE TABLE LIKE :).

Conversation continued with more interesting topics related to two different platforms, will make more posts on my-usage of open source components such as Hadoop, Pig and HIVE for solutions I work on, but thought to make a post on this three items;

There are no big differences between TSQL (or standard SQL) and HiveQL queries, but some are noticeable if you are an experienced TSQL developer. If you have used MySQL, you will not see much as HiveQL offers similar patterns.

Let me elaborate three things I mentioned with my reply, that will surely encourage you to start studying on HIVE if you have not started using it.

1. 
SELECT statement of SQL Server always start with SELECT clause and then FROM;

SELECT Col1, Col2
FROM Table1
WHERE Col3 = 1;

HiveQL allows the same but it can be written like this too;

FROM Table1
SELECT Col1, Col2
WHERE Col3 = 1;

2.
If we need to create a table with SQL Server and load data, there are two ways with pros and cons.

CREATE TABLE NewTable
(
 Col1 int
 , Col2 varchar
);

INSERT INTO Table2
SELECT Col1, COl2
FROM OldTable;

-- or

SELECT Col1, Col2
INTO NewTable
FROM OldTable;

But HIVE offers an easy way of doing it.

CREATE TABLE NewTable
STORED AS RCFile
AS SELECT Col1, Col2 FROM OldTable;

3.
When we need to duplicate a table structure, this is what we do with SQL Server;

SELECT Col1, Col2
INTO NewTable
FROM OldTable
WHERE 1=0;

and HiveQL facilitates this;

CREATE TABLE NewTable LIKE OldTable;

If I am not mistaken, some of these are supported with APS. Not only that HDInsight supports HIVE in all the ways, let's try to understand the usage of HIVE and combining it with SQL Server implementations with future posts.

No comments: