Tuesday, January 24, 2017

Creating a Testing Hadoop Cluster

Once I wrote a post on Hortonworks HDP Sandbox (Looking for a Hadoop Cluster for testing? Let's configure Hortonworks HDP Sanbox) which can be used for learning and testing many Hadoop related sub projects. Now the new version is available and bit different, since I did a session on Hive yesterday (https://www.meetup.com/LKBigData/events/236812354/) and used it for demonstration, thought to make a new post it; How to get the Sandbox configured in your machine.

Hortonworks Sanbox is available as a Virtual Machine. It is indeed a portable Apache Hadoop 2.5 with many sub projects and can be downloaded as a Sandbox for VMWARE, VIRTUALBOX or DOCKER. In order to download it, you need to visit http://hortonworks.com/products/sandbox.

I will make the post over Oracle VirtualBox. If you download the Sandbox for VirtualBox, follow below steps for adding the downloaded Sandbox for VirtualBox.

1. Open VirtualBox and click on File menu -> Import Appliance menu.


2. Browse the downloaded HDP_2.5_virtualbox.ova file and click on Next.


3. Change the configuration as you need. You may increase the CPU and Memory. If you need to change, double click on the default value and change.


4. That's all. Click on Import. It will add Sandbox to your virtual environment. This is what you should see finally.


Now you can start the virtual machine. Select it and click on Start or just double click on it. 


Once it is started, you can log in by pressing Alt+F5 and using root as user and hadoop as the password.


However, not like the older version, we can actually work with this without log in to this. As the image says, we can use the Browser for start a session on this using given URL, which is http://127.0.0.1:8888/.


You can open the dashboard that can be used to find out various information related components running inside but before that you need to get the password for each components. Here is the way of accessing some of the components.

1. Click on Quick Links for opening ADVANCE HDP QUICK LINKS.
2. Move the mouse over Ambari and note the user id and password.


3. Go back to main page and click on Dashboard.
4. Use the noted user id and password and click on Sign In.


5. This shows all running services and metric related to Hadoop components. If you need to see HDFS or need to open a window for executing Hive queries, click on the button next to Admin and select the menu item need.


6. This is how you see the Files View.


7. This is how you see the Hive View.


As you see, it is very easy to set it up and start using Hadoop. Of course, you have no way of setting many nodes but this is good for learning and testing.


6 comments:

Unknown said...

HI Dinesh, everything is downloaded and the setup is ready, but one issue, how can I open the browser? any command to run, please advise.

Dinesh Priyankara said...

Hi Neero,

You need to make sure that the VM is running. Once started, you can browse the VM using the http://127.0.0.1:8888 path.

Unknown said...

Hi Dinesh,

Got it working, the issue was I didn't try the browser in my local machine, instead I tried to open a browser in the VM.

Dinesh Priyankara said...

Great :), let me know if you need more clarification.

Unknown said...

I tried few samples with some txt files with less data, is there any web sites where we can download sample datasets as txt files which is having more than 1 million records to tryout. I tried few but couldn’t find proper samples.

Dinesh Priyankara said...

Hi Neero,

Did you download the sample dataset that comes with the tutorial?

http://hortonworks.com/hadoop-tutorial/loading-data-into-the-hortonworks-sandbox/#download-sample-data

This has a large dataset. I have not yet tried this out but it may satisfy your requirement.