Wednesday, July 6, 2016

HDFS Architecture



HDFS has a master/slave architecture. HDFS is built using the Java language;



5 daemons exists in Hadoop Master-Slave Architecture.
On Master Node: Name Node, Secondary Name Node and Job Tracker
On Slave Node: DataNode and Task Tracker

An HDFSCluster consists of a 
i) Single NameNode
ii) Number of DataNodes


HDFS LAYER: For Storage
NameNode: A master server that manages the file system namespace and regulates access to files by clients. It maintains the filesystem tree and the metadata for all the files and directories in the tree. This information is stored persistently on the local disk in the form of two files: 
- the namespace image and 
- the edit log.
The NameNode is a Single Point of Failure for the Hadoop Cluster.
The namenode also knows the datanodes on which all the blocks for a given file are located, however, it does not store block locations persistently, since this information is reconstructed from datanodes when the system starts.
Example:
Suppose File1  has been stored in HDFS which has been splitted into Block1+Block2.
NameNode: knows that File1-> Block1 (stored in DN2, DN3,DN4)+Block2 (stored in DN2,DN3DN5).

DataNodes:  Datanodes are the workhorses of the filesystem. They store and retrieve blocks when they are told to (by clients or the namenode), and they report back to the namenode periodically with lists of blocks that they are storing.


Without the namenode, the filesystem cannot be used. If the machine running the namenode failed, all the files on the filesystem would be lost since there would be no way of knowing how to reconstruct the files from the blocks on the datanodes. For this reason, it is important to make the namenode resilient to failure, and Hadoop provides two mechanisms for this.
1=> The first way is to back up the files that make up the persistent state of the filesystem metadata. Hadoop can be configured so that the namenode writes its persistent state to multiple filesystems. These writes are synchronous and atomic. The usual configuration choice is to write to local disk as well as a remote NFS mount.
2=> It is also possible to run a secondary namenode, which despite its name does not act as a namenode. Its main role is to periodically merge the namespace image with the edit log to prevent the edit log from becoming too large. The secondary namenode usually runs on a separate physical machine, since it requires plenty of CPU and as much memory as the namenode to perform the merge. It keeps a copy of the merged namespace image, which can be used in the event of the namenode failing. However, the state of the secondary  namenode lags that of the primary, so in the event of total failure
of the primary, data loss is almost certain.

MAPREDUCE LAYER: For Processing
MapReduce job is a unit of work that the client wants to be performed: it consists of the input data, the MapReduce program, and configuration information. Hadoop runs the job by dividing it into tasks, of which there are two types:map tasks and reduce tasks.
There are two types of nodes that control the job execution process: a jobtracker and a number of tasktrackers. 


JobTracker: 
It will handle client requests and assigns TaskTrackers to perform the tasks. The jobtracker coordinates all the jobs run on the system by scheduling tasks to run on tasktrackers depending on the Data Locality Mechanism. If any node fails in any TaskTracker, Job Tracker will take care of assigning the Tasks to other DataNodes, wherever replicas are available. This ensures no job fail even node fails in a cluster.

TaskTracker:
Tasktrackers run tasks and send progress reports to the jobtracker, which keeps a record of the overall progress of each job. If a task fails, the jobtracker can reschedule it on a different tasktracker. TaskTracker will send the heartbeat to JobTracker + slot information available in that particular node+ progress of Task Execution.





When client wants to write any file to HDFS, client will contact the Namenode to get the free locations and NameNode will provide that information to client.
Later which client can interact with Datanodes for data copy and the first datanode will take care of the replication using pipleined write.

To understand the write and read in HDFS in detail, refer next posts.


No comments:

Post a Comment