Thursday, June 18, 2015

Looking for a Hadoop Cluster for testing? Let's configure Hortonworks HDP Sanbox

The latest buzzword in IT, or more particularly in data analytic is Big Data. It does not come alone, it always comes with Hadoop which offers distributed storage and processing. Everyone loves to do some experiments with new technologies, or popular technologies, hence everyone loves to do the same with Big Data and Hadoop. But setting it up is not an easy task and cloud HDP subscriptions offered by various vendors are not so flexible in terms of trial-period given. However, if you really want, you can use Sanboxes offered by some vendors for testing Hadoop implementations.

One of the HDP cluster I used for testing is HDInsight. Since the trail is limited, searched for alternatives and found the Sandbox offered by Hortonworks. This sandbox is configured as a self-contained virtual machine and it can be simply used without connecting to cloud. It does not come with multiple nodes, means that all Name Node, Job Tracker, Data Node, etc. are in same virtual machine. You will not be able to get the exact picture of distribution but you can do everything you need to do with Hadoop with this.

Here are the steps for configuring HDP sandbox.

Visit Click on Hortonworks Sandbox under Get Started menu that is the top menu.

This takes you to a Download and Install page. At this moment, HDP 2.2.4 is the stable and reliable version, but it offers HDP 2.3 - Preview too. HDP 2.2.4 comes in three flavors; VirtualBox, VMWare and HyperV. Download the best suited for you.

Make sure you download the Install Guides too. Configuration and usage is same for all three types, for this post, I will assume that you download VirtualBox virtual machine which is Sandbox_HDP_2.2.4.2_VirtualBox.ova. Once this is downloaded, you need to import it, it is fairly straight forward, all instructions are given with the Installation Guide, follow it for importing into your VirtualBox environment (or VMWare or HyperV).

After importing, all you have to do is, click on Start button. If your machine is ready for running virtual machines, it should start without any issue, however it is very common to see following error message with most of VMs;

Failed to open a session for the virtual machine Hortonworks Sandbox with HDP

VT-x is not available. (VERR_VMX_NO_VMX).

Result Code: E_FAIL (0x80004005)
Component: Console
Interface: IConsole {8ab7c520-2442-4b66-8d74-4ff1e195d2b6}

There can be two reasons for that. Once is, not enabling Visualization in BIOS. Second can be, incompatibility with other virtual environments. If Virtualization is not enabled in your machine, boot the machine with BIOS and enable it. If you still get the same error, and if you are running with Windows 8, make sure you disable HyperV. This thread discusses the same, follow it:

And this video shows how to disable HyperV for addressing the same error:

Once everything is done, you should be able to start it and you should see a scree like below.

As it says, there are two ways of accessing this; you can press ALlt+F5 for logging, user id is root and password is hadoop.

Once login you can continue with your commands for working with Hadoop.

In addition to that, GUI is given too. As the first screen explains, open a browser and go for

Then click on for opening hue (Hadoop User Experience). It allows you to do your Hadoop work easily.

No comments: