Showing posts with label Hortonworks. Show all posts
Showing posts with label Hortonworks. Show all posts

Tuesday, January 24, 2017

Creating a Testing Hadoop Cluster

Once I wrote a post on Hortonworks HDP Sandbox (Looking for a Hadoop Cluster for testing? Let's configure Hortonworks HDP Sanbox) which can be used for learning and testing many Hadoop related sub projects. Now the new version is available and bit different, since I did a session on Hive yesterday (https://www.meetup.com/LKBigData/events/236812354/) and used it for demonstration, thought to make a new post it; How to get the Sandbox configured in your machine.

Hortonworks Sanbox is available as a Virtual Machine. It is indeed a portable Apache Hadoop 2.5 with many sub projects and can be downloaded as a Sandbox for VMWARE, VIRTUALBOX or DOCKER. In order to download it, you need to visit http://hortonworks.com/products/sandbox.

I will make the post over Oracle VirtualBox. If you download the Sandbox for VirtualBox, follow below steps for adding the downloaded Sandbox for VirtualBox.

1. Open VirtualBox and click on File menu -> Import Appliance menu.


2. Browse the downloaded HDP_2.5_virtualbox.ova file and click on Next.


3. Change the configuration as you need. You may increase the CPU and Memory. If you need to change, double click on the default value and change.


4. That's all. Click on Import. It will add Sandbox to your virtual environment. This is what you should see finally.


Now you can start the virtual machine. Select it and click on Start or just double click on it. 


Once it is started, you can log in by pressing Alt+F5 and using root as user and hadoop as the password.


However, not like the older version, we can actually work with this without log in to this. As the image says, we can use the Browser for start a session on this using given URL, which is http://127.0.0.1:8888/.


You can open the dashboard that can be used to find out various information related components running inside but before that you need to get the password for each components. Here is the way of accessing some of the components.

1. Click on Quick Links for opening ADVANCE HDP QUICK LINKS.
2. Move the mouse over Ambari and note the user id and password.


3. Go back to main page and click on Dashboard.
4. Use the noted user id and password and click on Sign In.


5. This shows all running services and metric related to Hadoop components. If you need to see HDFS or need to open a window for executing Hive queries, click on the button next to Admin and select the menu item need.


6. This is how you see the Files View.


7. This is how you see the Hive View.


As you see, it is very easy to set it up and start using Hadoop. Of course, you have no way of setting many nodes but this is good for learning and testing.


Thursday, June 18, 2015

Looking for a Hadoop Cluster for testing? Let's configure Hortonworks HDP Sanbox

The latest buzzword in IT, or more particularly in data analytic is Big Data. It does not come alone, it always comes with Hadoop which offers distributed storage and processing. Everyone loves to do some experiments with new technologies, or popular technologies, hence everyone loves to do the same with Big Data and Hadoop. But setting it up is not an easy task and cloud HDP subscriptions offered by various vendors are not so flexible in terms of trial-period given. However, if you really want, you can use Sanboxes offered by some vendors for testing Hadoop implementations.

One of the HDP cluster I used for testing is HDInsight. Since the trail is limited, searched for alternatives and found the Sandbox offered by Hortonworks. This sandbox is configured as a self-contained virtual machine and it can be simply used without connecting to cloud. It does not come with multiple nodes, means that all Name Node, Job Tracker, Data Node, etc. are in same virtual machine. You will not be able to get the exact picture of distribution but you can do everything you need to do with Hadoop with this.

Here are the steps for configuring HDP sandbox.

Visit http://hortonworks.com/. Click on Hortonworks Sandbox under Get Started menu that is the top menu.


This takes you to a Download and Install page. At this moment, HDP 2.2.4 is the stable and reliable version, but it offers HDP 2.3 - Preview too. HDP 2.2.4 comes in three flavors; VirtualBox, VMWare and HyperV. Download the best suited for you.


Make sure you download the Install Guides too. Configuration and usage is same for all three types, for this post, I will assume that you download VirtualBox virtual machine which is Sandbox_HDP_2.2.4.2_VirtualBox.ova. Once this is downloaded, you need to import it, it is fairly straight forward, all instructions are given with the Installation Guide, follow it for importing into your VirtualBox environment (or VMWare or HyperV).


After importing, all you have to do is, click on Start button. If your machine is ready for running virtual machines, it should start without any issue, however it is very common to see following error message with most of VMs;

Failed to open a session for the virtual machine Hortonworks Sandbox with HDP 2.2.4.2.

VT-x is not available. (VERR_VMX_NO_VMX).

Result Code: E_FAIL (0x80004005)
Component: Console
Interface: IConsole {8ab7c520-2442-4b66-8d74-4ff1e195d2b6}



There can be two reasons for that. Once is, not enabling Visualization in BIOS. Second can be, incompatibility with other virtual environments. If Virtualization is not enabled in your machine, boot the machine with BIOS and enable it. If you still get the same error, and if you are running with Windows 8, make sure you disable HyperV. This thread discusses the same, follow it: http://h30434.www3.hp.com/t5/Desktop-Hardware/How-to-Enable-Intel-Virtualization-Technology-vt-x-on-HP/td-p/3198063.

And this video shows how to disable HyperV for addressing the same error: https://www.youtube.com/watch?v=Y56boAsdptw.

Once everything is done, you should be able to start it and you should see a scree like below.



As it says, there are two ways of accessing this; you can press ALlt+F5 for logging, user id is root and password is hadoop.


Once login you can continue with your commands for working with Hadoop.


In addition to that, GUI is given too. As the first screen explains, open a browser and go for http://127.0.0.1:8888/.


Then click on http://127.0.0.1:8000 for opening hue (Hadoop User Experience). It allows you to do your Hadoop work easily.