Pranab's scrapbook: Hadoop

Showing posts with label Hadoop. Show all posts

Tuesday, 9 February 2016

Dead datanode detection

The Namenode determines whether a datanode dead or alive by using heartbeats. Each DataNode sends a Heartbeat message to the NameNode every 3 seconds (default value). This heartbeat interval is controlled by the dfs.heartbeat.interval property in hdfs-site.xml file.
If a datanode dies, namenode waits for almost 10 mins before removing it from live nodes. Till the time datanode is marked dead, if we try to copy data in HDFS we may get error if the dead datanode is selected for storing block of the data. In the below screenshot, our datanode running on 192.168.10.75 has actually died, but namenode has not marked it dead yet. So while we copying a file to HDFS, the block write operation to the datanode 192.168.10.75 will fail and we got the below error:

The time period for determining whether a datanode is dead is calculated as:
2 * dfs.namenode.heartbeat.recheck-interval + 10 * 1000 * dfs.heartbeat.interval
The default values for dfs.namenode.heartbeat.recheck-interval is 300000 milliseconds (5 minutes) and dfs.heartbeat.interval is 3 seconds.
So if we use the default values in the above formula:
2 * 300000 + 10 * 1000 * 3 = 630000 milliseconds
Which is 10 minutes and 30 seconds. So after 10 minutes and 30 seconds, the namenode marks a datanode as dead.

For some cases, this 10 minutes and 30 seconds interval is high and we can adjust it by mainly adjusting the dfs.namenode.heartbeat.recheck-interval property. For example, suppose we want to adjust this timeout to be around 4 to 5 minutes (i.e. around 4-5 minutes interval before a datanode is marked dead). We can set the dfs.namenode.heartbeat.recheck-interval property in hdfs-site.xml file:
<property>
<name>dfs.namenode.heartbeat.recheck-interval</name>
<value>120000</value>
</property>

Now lets calculate the timeout again:
2 * 120000 + 10 * 1000 * 3 = 270000 milliseconds
This is 4 minutes and 30 seconds. So now my namenode will mark a datanode as dead in 4 mins and 30 secs.

Friday, 5 February 2016

Configuring Federated HDFS Cluster with High Availability (HA) and Automatic Failover

Requirement: We have to configure federated namenodes for our new Hadoop cluster. We need two federated namenodes each for our departments sales and analytics. In my last post http://pe-kay.blogspot.in/2016/02/change-single-namenode-setup-to.html I had written about federated namenode setup, you can refere that also. The HDFS federated cluster that we are creating is of high priority and users will be completely dependent on our cluster, so we can not afford to have downtime. So we have to enable HA for our federated namenodes. So there will be total 4 namenodes, two namenodes (1 active and 1 standby) for sales namespace and two for analytics.

For demonstration of this requirement, I am going to use 4 virtual box with Ubuntu 14.04 VMs. The VMs are named as server1, server2, server3 and server4.
There are two ways we can configure HDFS High Availability, using Using the Quorum Journal Manager or using Shared Storage. Using shared storage in HA design, there is always a risk of failure of the shared storage. As I am going to use more latest version of hadoop (i.e. 2.6), so I will use Quorum Journal Manager which provides additional level of HA by having a group of JournalNodes. When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. So if we configure multiple journalnodes, then we can afford jornalnode failures also. For details you can read offical hadoop documents.
For configuring HA with automatic failover we also need Zookeeper. For details of Zookeeper, you can visit the link https://zookeeper.apache.org/

Placement of different Services

Namenodes:
1) server1 : Will run active/standby namenode for namespace sales
2) server2: Will run active/standby namenode for namespace analytics
3) server3: Will run active/standby namenode for namespace sales
4) server4: Will run active/standby namenode for namespace analytics

Datanodes:
For this demo, I will run datanodes on all the four VMs.

Journal Nodes:
Journal nodes will run on three VMs server1, server2 and server3

Zookeeper:
Jookeeper will run on three VMs server2, server3 and server4

Zookeeper Configuration

First I am going to configure a Zookeeper three node cluster. I am not going to write details about Zookeeper and its configuration, I will only write the configs that is necessary for this document. I have downloaded and deployed Zookeeper in server2, server3 and server4 servers. I am using the default Zookeeper configuration, for our zookeeper ensemble I have added the below lines in Zookeeper configuration file:

server.1=server2:2222:2223
server.2=server3:2222:2223
server.3=server4:2222:2223

Create directory for storing Zookeeper data and log files:
hadoop@server2:~$ mkdir /home/hadoop/hdfs_data/zookeeper
hadoop@server3:~$ mkdir /home/hadoop/hdfs_data/zookeeper
hadoop@server4:~$ mkdir /home/hadoop/hdfs_data/zookeeper
Create Zookeeper ID files:
hadoop@server2:~$ echo 1 > /home/hadoop/hdfs_data/zookeeper/myid
hadoop@server3:~$ echo 2 > /home/hadoop/hdfs_data/zookeeper/myid
hadoop@server4:~$ echo 3 > /home/hadoop/hdfs_data/zookeeper/myid

Start Zookeeper cluster
hadoop@server2:~$ zkServer.sh start
hadoop@server3:~$ zkServer.sh start
hadoop@server4:~$ zkServer.sh start

HDFS Configuration

core-site.xml:
We are going to use ViewFS and define which paths map to which namenode. For details about ViewFS you may visit this https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/ViewFs.html link. Now the clients will load the ViewFS plugin and look for mount table information in the configuration file.
<property>
    <name>fs.defaultFS</name>
    <value>viewfs:///</value>
</property>
Here we are mapping a folder to a namespace
Note: In last blog post on HDFS federation http://pe-kay.blogspot.in/2016/02/change-single-namenode-setup-to.html, I mapped the path to a namenode’s URL, but as we are now into a HA configuration so I have to set the mapping to the respective nameservice.
<property>
    <name>fs.viewfs.mounttable.default.link./sales</name>
    <value>hdfs://sales</value>
</property>
<property>
    <name>fs.viewfs.mounttable.default.link./analytics</name>
    <value>hdfs://analytics</value>
</property>

We have to give a directory name in JournalNode machines where the edits and other local state used by the JournalNodes will be stored. Create this directory in all the nodes where the journalnodes will run.

<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/hdfs_data/journalnode</value>
</property>

hdfs-site.xml
Note: I have included only those configurations which are necessary to configure federated HA cluster.



<property>
    <name>dfs.nameservices</name>
    <value>sales,analytics</value>
</property>
In each nameservice we will define 2 namenodes, one will be active namenode and the other one will be standby namenode.

<property>
<name>dfs.ha.namenodes.sales</name>
<value>sales-nn1,sales-nn2</value>
</property>

<property>
<name>dfs.ha.namenodes.analytics</name>
<value>analytics-nn1,analytics-nn2</value>
</property>


<property>
    <name>dfs.namenode.rpc-address.sales.sales-nn1</name>
        <value>server1:8020</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.sales.sales-nn2</name>
        <value>server3:8020</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.analytics.analytics-nn1</name>
        <value>server2:8020</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.analytics.analytics-nn2</name>
        <value>server4:8020</value>
</property>


<property>
    <name>dfs.namenode.http-address.sales.sales-nn1</name>
        <value>server1:50070</value>
</property>
<property>
    <name>dfs.namenode.http-address.sales.sales-nn2</name>
        <value>server3:50070</value>
</property>

<property>
    <name>dfs.namenode.http-address.analytics.analytics-nn1</name>
        <value>server2:50070</value>
</property>
<property>
    <name>dfs.namenode.http-address.analytics.analytics-nn2</name>
        <value>server4:50070</value>
</property>

A single set of JournalNodes can provide storage for multiple federated namesystems. So I will configure the same set of JournalNodes running on server1, server2 and server3 for both the nameservices sales and analytics.

<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
   <name>ha.zookeeper.quorum</name>
   <value>server2:2181,server3:2181,server4:2181</value>
</property>


<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>


<property>
<name>dfs.client.failover.proxy.provider.sales</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<property>
<name>dfs.client.failover.proxy.provider.analytics</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

For more about the configuration properties, you can visit https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

ZooKeeper Initialization

For creating required znode of sales namespace, run the below command from one of the Namenodes of sales nameservice
hadoop@server1:~/.ssh$ hdfs zkfc –formatZK

For creating required znode of analytics namespace, run the below command from one of the Namenodes of analytics nameservice
hadoop@server2:~/.ssh$ hdfs zkfc –formatZK

Zookeeper znode tree before running above commands:

Zookeeper znode tree after running above commands:

Start JournalNodes

hadoop@server1:~$ hadoop-daemon.sh start journalnode
hadoop@server2:~$ hadoop-daemon.sh start journalnode
hadoop@server3:~$ hadoop-daemon.sh start journalnode

Format Namenodes

While formating the namenodes, I will use the -clusterID option to provide a name for the hadoop cluster we are creating. This will enable us to provide the same clusterID for all the namenodes of my cluster.
Formating the namenodes of sales nameservice:We have to run the format command (hdfs namenode -format) on one of NameNodes of sales nameservice. Our sales nameservice will be on server1 and server3. I am running the format command in server1.
hadoop@server1:~$ hdfs namenode -format -clusterID myCluster
One of the namenodes (in server1) of sales nameservice has been formated, so we should now copy over the contents of the NameNode metadata directories to the other, unformatted NameNode server3.
Start the namenode in server1 and run the hdfs namenode –bootstrapStandby command in server3.
hadoop@server1:~$ hadoop-daemon.sh start namenode
hadoop@server3:~$ hdfs namenode –bootstrapStandby

Start the namenode in server3:
hadoop@server3:~$ hadoop-daemon.sh start namenode

Formating the namenodes of analytics nameservice:We have to run the format command (hdfs namenode -format) on one of NameNodes of analytics nameservice. Our analytics nameservice will be on server2 and server4. I am running the format command in server2.
hadoop@server2:~$ hdfs namenode -format -clusterID myCluster
One of the namenodes (in server2) of analytics nameservice has been formated, so we should now copy over the contents of the NameNode metadata directories to the other, unformatted NameNode server4.
Start the namenode in server2 and run the hdfs namenode –bootstrapStandby command in server4.
hadoop@server2:~$ hadoop-daemon.sh start namenode
hadoop@server4:~$ hdfs namenode –bootstrapStandby
Start the namenode in server4:
hadoop@server4:~$ hadoop-daemon.sh start namenode

Start remaining services

Start the ZKFailoverController process (zkfs, it is a ZooKeeper client which also monitors and manages the state of the NameNode.) in all the VMs where the namenodes are running.

hadoop@server1:~$ hadoop-daemon.sh start zkfc
hadoop@server2:~$ hadoop-daemon.sh start zkfc
hadoop@server3:~$ hadoop-daemon.sh start zkfc
hadoop@server4:~$ hadoop-daemon.sh start zkfc

Start DataNodes
hadoop@server1:~$ hadoop-daemon.sh start datanode
hadoop@server2:~$ hadoop-daemon.sh start datanode
hadoop@server3:~$ hadoop-daemon.sh start datanode
hadoop@server4:~$ hadoop-daemon.sh start datanode

Checking our cluster

sales namespace:
Lets check the namenodes of sales namespace
Open the URLs in web-browser http://server1:50070 and http://server3:50070
We can see that the namenode in server1 is active right now.

Namenode in server3 is standby

analytics namespace:
Open the URLs in web-browser http://server2:50070 and http://server4:50070
server2 is active now

server4 is standby

Now I will copy two files into our two namespace folders /sales and /analytics and will check if they are in correct namenode:

Checking in server1 (active namenode for /sales nameservice), we can see that namenode running on server1 has files only related to sales namespace.

Similarly, server2 (active namenode for /analytics nameservice), we can see that namenode running on server2 has files only related to analytics namespace.

If we try to read from a standby namenode, we will get error. In the below screenshot I tried to read from server3 (standby namenode for /sales nameservice) and got error.

Lets put few more files and check if the automatic failover is working:

I am killing the active namenode of /sales namespace running on server1

If we check the cluster health, we can see that serve1 is down.

Now if we check the standby namenode of sales namespace running on server3, we can see that it has become active now:

Lets check the files, we can see the files are also available, Smile

HA is working and so automatic failover.

I am starting the namenode in server1 again.

Now if we check the status of the namenode in server1, we can see that it has become standby as expected.

One Final Note:
Once initial configurations are done, you can start the cluster in the following order:
First start the Zookeeper services:
hadoop@server2:~$ zkServer.sh start
hadoop@server3:~$ zkServer.sh start
hadoop@server4:~$ zkServer.sh start
After that start journalnodes
hadoop@server1:~$ hadoop-daemon.sh start journalnode
hadoop@server2:~$ hadoop-daemon.sh start journalnode
hadoop@server3:~$ hadoop-daemon.sh start journalnode
Finally all the namenodes, datanodes and ZKFailoverController processes using the start-dfs.sh script.

Wednesday, 3 February 2016

Change single namenode setup to federated setup

Writing my first blog of 2016, my first step into the exciting Hadoop World, a little late but better late than never.
In this post I will show you how to convert or migrate an existing single namenode setup to a Federated namenode setup.
I am not going write about details of namenode federation, there are tons of resources available online, please go through any of these for understanding the concepts. You may also refer this official Hadoop link https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/Federation.html
So starting my factitious story:
Our existing HDFS cluster had single namenode named masternode and three datanodes datanode1, datanode2 and datanode3. This setup was running fine and was working well for our IT department users. One fine day, manager came to the Hadoop Admin and said two more departments sales and analytics want to use the Hadoop cluster, but both the departments want their data to be managed by different namenodes. Analytics team is going to store lots of files so that should not affect the performance of the existing namenode. Our datanodes have very large amount of free storages, so we wanted to utilize those free storage space as well as wanted to share the load of namenode by adding another one. So we decided to add another namenode and keep analytics department’s files metadata in the new namenode.
So in the new setup we will have two namenodes masternode (the existing one), masternode1 (the new one) and the three datanodes datanode1, datanode2 and datanode3. The analytics team’s data will go to masternode1 and sales team’s data will be in masternode. The existing users will access their data as it is.
The sales team will store their data in /sales folder in our HDFS cluster.
The analytics team will store their data in /analytics folder in our HDFS cluster.
I am not going to write all the installation and setup parts here, I will write just those steps that are necessary to explain this topic.

Step 1:

Install hadoop on the new namenode masternode1.

Step 2:

Edit the hdfs-site.xml file in masternode and add settings related to HDFS federation:

We are adding two namespaces: sales and analytics. The three datanodes will store blocks for both the namespaces.
<property>
    <name>dfs.nameservices</name>
    <value>sales,analytics</value>
</property>

Bind properties for our existing namenode masternode
<property>
    <name>dfs.namenode.rpc-address.sales</name>
        <value>masternode:8020</value>
</property>
<property>
    <name>dfs.namenode.http-address.sales</name>
    <value>masternode:50070</value>
</property>
Bind properties for the new namenode masternode1
<property>
    <name>dfs.namenode.rpc-address.analytics</name>
        <value>masternode1:8020</value>
</property>
<property>
    <name>dfs.namenode.http-address.analytics</name>
    <value>masternode1:50070</value>
</property>

Step 3:

We are going to use ViewFS and define which paths map to which namenode. For details about ViewFS you may visit this https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/ViewFs.html link.
We are going to edit the existing core-site.xml file:

Now the clients will load the ViewFS plugin and look for mount table information in the configuration file.
<property>
    <name>fs.defaultFS</name>
    <value>viewfs:///</value>
</property>

Map /sales folder to masternode
<property>
    <name>fs.viewfs.mounttable.default.link./sales</name>
    <value>hdfs://masternode:8020/sales</value>
</property>
Map /analytics folder to masternode1
<property>
    <name>fs.viewfs.mounttable.default.link./analytics</name>
    <value>hdfs://masternode1:8020/analytics</value>
</property>
Restart the exisitng namenode masternode to reflect the changes.

Step 4:

Copy the new hdfs-site.xml and core-site.xml file to masternode1, datanode1, datanode2 and datanode3.

Step 5:

As we are already using the namenode masternode and its clusterID was already generated (some cryptic id CID-d971e1b9-318c-44c7-90b2-374075f335f9 generated automatically when we formated the existing namenode masternode). I searched a lot for changing the clusterID but most of the resources said that I have to format the namenode to generate the new clusterID of my choice, which was not possible for our case as it will destroy the exsting namenode data.
So I copied the exisitng clusterID of masternode and going to use the same clusterID in the new namenode masternode1 so that it will be the part of the same cluster.

Step 6:

Format the new namenode masternode1, give the same clusterID as masternode
hadoop@masternode1:~$ hdfs namenode -format -clusterID CID-d971e1b9-318c-44c7-90b2-374075f335f9

Start the new namenode masternode1

Step 7:

Reload the configuration files of all the three datanodes.
hadoop@datanode1:~$ hdfs dfsadmin -refreshNamenodes datanode1:50020
hadoop@datanode2:~$ hdfs dfsadmin -refreshNamenodes datanode2:50020
hadoop@datanode3:~$ hdfs dfsadmin -refreshNamenodes datanode3:50020

Step 8:

Create the folders for /sales and /analytics mapping
$ hdfs dfs -mkdir hdfs://masternode:8020/sales
$ hdfs dfs -mkdir hdfs://masternode1:8020/analytics

All done, for testing I will copy two files in the two directories that we setup:
$ hdfs dfs -put Centers2015.csv /analytics
$ hdfs dfs -put zookeeper-3.4.6.tar.gz /sales

Each namenode has two URLs:
1) http://masternode:50070/dfshealth.html : View of the namespace managed by this namenode.

2) http://masternode:50070/dfsclusterhealth.jsp : Aggregated cluster view which includes all the namenodes of the cluster.

Using the above links, we can check whether our files are in correct namespace Smile

CopyDisable