Saturday, July 11, 2015

Hadoop cluster and how it stores a file when deploying

Hadoop is no longer a new word, everyone knows it and everyone knows why we need it. During my last two presentations, I explained Hadoop Cluster and how it stores a file when a file is placed. Here is the image I used for explaining Hadoop Cluster.


Hadoop cluster consists couple of components, including Master nodes and Slave nodes as main components. Master Nodes responsible for managing and coordinating services and tasks (Eg. using Name Node) and Slave Nodes responsible for storing and processing data providing resources like CPU and memory.

Generally, Hadoop is configured on rack-based servers. On top of each rack, network switch is configured for intra-rack communication and another network switch is configured for handling communication between rack switches and client that runs Hadoop client-related software.

Hadoop uses HDFS for holding files. It is responsible for breaking large files into smaller chunks (128MB - configurable), placing them in different slave nodes and replicating them for providing high availability. Here is a video that shows how a file is distributed in Hadoop Distributed File System;


No comments: