CopyDisable

Friday, 5 February 2016

Configuring Federated HDFS Cluster with High Availability (HA) and Automatic Failover

Requirement: We have to configure federated namenodes for our new Hadoop cluster. We need two federated namenodes each for our departments sales and analytics. In my last post http://pe-kay.blogspot.in/2016/02/change-single-namenode-setup-to.html I had written about federated namenode setup, you can refere that also. The HDFS federated cluster that we are creating is of high priority and users will be completely dependent on our cluster, so we can not afford to have downtime. So we have to enable HA for our federated namenodes. So there will be total 4 namenodes, two namenodes (1 active and 1 standby) for sales namespace and two for analytics.
Slide1

For demonstration of this requirement, I am going to use 4 virtual box with Ubuntu 14.04 VMs. The VMs are named as server1, server2, server3 and server4.
There are two ways we can configure HDFS High Availability, using Using the Quorum Journal Manager or using Shared Storage. Using shared storage in HA design, there is always a risk of failure of the shared storage. As I am going to use more latest version of hadoop (i.e. 2.6), so I will use Quorum Journal Manager which provides additional level of HA by having a group of JournalNodes. When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. So if we configure multiple journalnodes, then we can afford jornalnode failures also. For details you can read offical hadoop documents.
For configuring HA with automatic failover we also need Zookeeper. For details of Zookeeper, you can visit the link https://zookeeper.apache.org/

Placement of different Services

Namenodes:
1) server1 : Will run active/standby namenode for namespace sales
2) server2: Will run active/standby namenode for namespace analytics
3) server3: Will run active/standby namenode for namespace sales
4) server4: Will run active/standby namenode for namespace analytics

Datanodes:
For this demo, I will run datanodes on all the four VMs.

Journal Nodes:
Journal nodes will run on three VMs server1, server2 and server3

Zookeeper:
Jookeeper will run on three VMs server2, server3 and server4


Zookeeper Configuration

First I am going to configure a Zookeeper three node cluster. I am not going to write details about Zookeeper and its configuration, I will only write the configs that is necessary for this document. I have downloaded and deployed Zookeeper in server2, server3 and server4 servers. I am using the default Zookeeper configuration, for our zookeeper ensemble I have added the below lines in Zookeeper configuration file:

server.1=server2:2222:2223
server.2=server3:2222:2223
server.3=server4:2222:2223


Create directory for storing Zookeeper data and log files:
hadoop@server2:~$ mkdir /home/hadoop/hdfs_data/zookeeper
hadoop@server3:~$ mkdir /home/hadoop/hdfs_data/zookeeper
hadoop@server4:~$ mkdir /home/hadoop/hdfs_data/zookeeper
Create Zookeeper ID files:
hadoop@server2:~$ echo 1 > /home/hadoop/hdfs_data/zookeeper/myid
hadoop@server3:~$ echo 2 > /home/hadoop/hdfs_data/zookeeper/myid
hadoop@server4:~$ echo 3 > /home/hadoop/hdfs_data/zookeeper/myid

Start Zookeeper cluster
hadoop@server2:~$ zkServer.sh start
hadoop@server3:~$ zkServer.sh start
hadoop@server4:~$ zkServer.sh start


HDFS Configuration

core-site.xml:
We are going to use ViewFS and define which paths map to which namenode. For details about ViewFS you may visit this https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/ViewFs.html link. Now the clients will load the ViewFS plugin and look for mount table information in the configuration file.
<property>
    <name>fs.defaultFS</name>
    <value>viewfs:///</value>
</property>

Here we are mapping a folder to a namespace
Note: In last blog post on HDFS federation http://pe-kay.blogspot.in/2016/02/change-single-namenode-setup-to.html, I mapped the path to a namenode’s URL, but as we are now into a HA configuration so I have to set the mapping to the respective nameservice.
<property>
    <name>fs.viewfs.mounttable.default.link./sales</name>
    <value>hdfs://sales</value>
</property>
<property>
    <name>fs.viewfs.mounttable.default.link./analytics</name>
    <value>hdfs://analytics</value>
</property>


We have to give a directory name in JournalNode machines where the edits and other local state used by the JournalNodes will be stored. Create this directory in all the nodes where the journalnodes will run.

<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/home/hadoop/hdfs_data/journalnode</value>
</property>



hdfs-site.xml
Note: I have included only those configurations which are necessary to configure federated HA cluster.

<!-- Below properties are added for NameNode Federation and HA -->
<!-- Nameservices for our two federated namespaces sales and analytics -->
<property>
    <name>dfs.nameservices</name>
    <value>sales,analytics</value>
</property>

In each nameservice we will define 2 namenodes, one will be active namenode and the other one will be standby namenode.
<!-- Unique identifiers for each NameNodes in the sales nameservice -->
<property>
  <name>dfs.ha.namenodes.sales</name>
  <value>sales-nn1,sales-nn2</value>
</property>

<!-- Unique identifiers for each NameNodes in the analytics nameservice -->
<property>
  <name>dfs.ha.namenodes.analytics</name>
  <value>analytics-nn1,analytics-nn2</value>
</property>


<!-- RPC address for each NameNode of sales namespace to listen on -->
<property>
    <name>dfs.namenode.rpc-address.sales.sales-nn1</name>
        <value>server1:8020</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.sales.sales-nn2</name>
        <value>server3:8020</value>
</property>

<!-- RPC address for each NameNode of analytics namespace to listen on -->
<property>
    <name>dfs.namenode.rpc-address.analytics.analytics-nn1</name>
        <value>server2:8020</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.analytics.analytics-nn2</name>
        <value>server4:8020</value>
</property>



<!-- HTTP address for each NameNode of sales namespace to listen on -->
<property>
    <name>dfs.namenode.http-address.sales.sales-nn1</name>
        <value>server1:50070</value>
</property>
<property>
    <name>dfs.namenode.http-address.sales.sales-nn2</name>
        <value>server3:50070</value>
</property>

<!-- HTTP address for each NameNode of analytics namespace to listen on -->
<property>
    <name>dfs.namenode.http-address.analytics.analytics-nn1</name>
        <value>server2:50070</value>
</property>
<property>
    <name>dfs.namenode.http-address.analytics.analytics-nn2</name>
        <value>server4:50070</value>
</property>


A single set of JournalNodes can provide storage for multiple federated namesystems. So I will configure the same set of JournalNodes running on server1, server2 and server3 for both the nameservices sales and analytics.
<!-- Addresses of the JournalNodes which provide the shared edits storage, written to by the Active nameNode and read by the Standby NameNode –>
Shared edit storage in JournalNodes for sales namespace.
<property>
  <name>dfs.namenode.shared.edits.dir.sales</name>
  <value>qjournal://server1:8485;server2:8485;server3:8485/sales</value>
</property>

Shared edit storage in JournalNodes for analytics namespace.
<property>
  <name>dfs.namenode.shared.edits.dir.analytics</name>
  <value>qjournal://server1:8485;server2:8485;server3:8485/analytics</value>
</property>


<!-- Configuring automatic failover -->
<property>
  <name>dfs.ha.automatic-failover.enabled</name>
  <value>true</value>
</property>

<property>
   <name>ha.zookeeper.quorum</name>
   <value>server2:2181,server3:2181,server4:2181</value>
</property>

<!-- Fencing method that will be used to fence the Active NameNode during a failover -->
<!-- sshfence: SSH to the Active NameNode and kill the process -->
<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>
<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/hadoop/.ssh/id_rsa</value>
</property>


<!-- Configure the name of the Java class which will be used by the DFS Client to determine which NameNode is the current Active -->
<property>
  <name>dfs.client.failover.proxy.provider.sales</name>  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>


<property>
  <name>dfs.client.failover.proxy.provider.analytics</name>  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>


For more about the configuration properties, you can visit https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

ZooKeeper Initialization

For creating required znode of sales namespace, run the below command from one of the Namenodes of sales nameservice
hadoop@server1:~/.ssh$ hdfs zkfc –formatZK

For creating required znode of analytics namespace, run the below command from one of the Namenodes of analytics nameservice
hadoop@server2:~/.ssh$ hdfs zkfc –formatZK

Zookeeper znode tree before running above commands:
image
Zookeeper znode tree after running above commands:
image

Start JournalNodes

hadoop@server1:~$ hadoop-daemon.sh start journalnode
hadoop@server2:~$ hadoop-daemon.sh start journalnode
hadoop@server3:~$ hadoop-daemon.sh start journalnode

Format Namenodes

While formating the namenodes, I will use the -clusterID option to provide a name for the hadoop cluster we are creating. This will enable us to provide the same clusterID for all the namenodes of my cluster.
Formating the namenodes of sales nameservice:We have to run the format command (hdfs namenode -format) on one of NameNodes of sales nameservice. Our sales nameservice will be on server1 and server3. I am running the format command in server1.
hadoop@server1:~$ hdfs namenode -format -clusterID myCluster
One of the namenodes (in server1) of sales nameservice has been formated, so we should now copy over the contents of the NameNode metadata directories to the other, unformatted NameNode server3.
Start the namenode in server1 and run the hdfs namenode –bootstrapStandby command  in server3.
hadoop@server1:~$ hadoop-daemon.sh start namenode
hadoop@server3:~$ hdfs namenode –bootstrapStandby

Start the namenode in server3:
hadoop@server3:~$ hadoop-daemon.sh start namenode

Formating the namenodes of analytics nameservice:We have to run the format command (hdfs namenode -format) on one of NameNodes of analytics nameservice. Our analytics nameservice will be on server2 and server4. I am running the format command in server2.
hadoop@server2:~$ hdfs namenode -format -clusterID myCluster
One of the namenodes (in server2) of analytics nameservice has been formated, so we should now copy over the contents of the NameNode metadata directories to the other, unformatted NameNode server4.
Start the namenode in server2 and run the hdfs namenode –bootstrapStandby command  in server4.
hadoop@server2:~$ hadoop-daemon.sh start namenode
hadoop@server4:~$ hdfs namenode –bootstrapStandby
Start the namenode in server4:
hadoop@server4:~$ hadoop-daemon.sh start namenode

Start remaining services


Start the ZKFailoverController process (zkfs, it is a ZooKeeper client which also monitors and manages the state of the NameNode.) in all the VMs where the namenodes are running.

hadoop@server1:~$ hadoop-daemon.sh start zkfc
hadoop@server2:~$ hadoop-daemon.sh start zkfc
hadoop@server3:~$ hadoop-daemon.sh start zkfc
hadoop@server4:~$ hadoop-daemon.sh start zkfc


Start DataNodes
hadoop@server1:~$ hadoop-daemon.sh start datanode
hadoop@server2:~$ hadoop-daemon.sh start datanode
hadoop@server3:~$ hadoop-daemon.sh start datanode
hadoop@server4:~$ hadoop-daemon.sh start datanode



Checking our cluster

sales namespace:
Lets check the namenodes of sales namespace
Open the URLs in web-browser http://server1:50070 and http://server3:50070
We can see that the namenode in server1 is active right now.
image
Namenode in server3 is standby
image


analytics namespace:
Open the URLs in web-browser http://server2:50070 and http://server4:50070
server2 is active nowimage

server4 is standbyimage

Now I will copy two files into our two namespace folders /sales and /analytics and will check if they are in correct namenode:
image
Checking in server1 (active namenode for /sales nameservice), we can see that namenode running on server1 has files only related to sales namespace.
image

Similarly, server2 (active namenode for /analytics nameservice), we can see that namenode running on server2 has files only related to analytics namespace.
image

If we try to read from a standby namenode, we will get error. In the below screenshot I tried to read from server3 (standby namenode for /sales nameservice) and got error.

image

Lets put few more files and check if the automatic failover is working:
image
I am killing the active namenode of /sales namespace running on server1
image
If we check the cluster health, we can see that serve1 is down.
image
Now if we check the standby namenode of sales namespace running on server3, we can see that it has become active now:
image
Lets check the files, we can see the files are also available, Smile HA is working and so automatic failover.
image
I am starting the namenode in server1 again.
image

Now if we check the status of the namenode in server1, we can see that it has become standby as expected.
image


One Final Note:
Once initial configurations are done, you can start the cluster in the following order:
First start the Zookeeper services:
hadoop@server2:~$ zkServer.sh start
hadoop@server3:~$ zkServer.sh start
hadoop@server4:~$ zkServer.sh start
After that start journalnodes
hadoop@server1:~$ hadoop-daemon.sh start journalnode
hadoop@server2:~$ hadoop-daemon.sh start journalnode
hadoop@server3:~$ hadoop-daemon.sh start journalnode
Finally all the namenodes, datanodes and ZKFailoverController processes using the start-dfs.sh script.
image