Installing Hadoop 2.6.0 on CentOS 7

by Marek Gagolewski, Maciej Bartoszuk, Anna Cena, and Jan Lasek (Rexamine).

Configuring a working Hadoop 2.6.0 environment on CentOS 7 is a bit of a struggle. Here are the steps we made to set everything up so that we have a working hadoop cluster. Of course, there many tutorials on this topic over the internet. None of the solutions presented there worked in our case. Thus, there is a high possibility that also this step-by-step guide will make you very frustrated. Anyway, resolving errors generated by Hadoop should make you understand this environment much better. No pain no gain.

Basic CentOS setup

Let’s assume that we have a fresh CentOS install. On each node:

  1. Edit /etc/hosts
# nano /etc/hosts

Add the following lines (change IP addresses accordingly): hmaster hslave1 hslave2 hslave3
  1. Create user hadoop
# useradd hadoop
# passwd hadoop
  1. Set up key-based (passwordless) login:
# su hadoop
$ ssh-keygen -t rsa
$ ssh-copy-id -i ~/.ssh/ hadoop@hmaster
$ ssh-copy-id -i ~/.ssh/ hadoop@hslave1
$ ssh-copy-id -i ~/.ssh/ hadoop@hslave2
$ ssh-copy-id -i ~/.ssh/ hadoop@hslave3
$ chmod 0600 ~/.ssh/authorized_keys

This will be useful when we’d like to start all necessary hadoop services on all the slave nodes.

Installing Oracle Java SDK

  1. Download latest Oracle JDK and save it in the /opt directory.

  2. On hmaster, unpack Java:

# cd /opt
# tar -zxf jdk-8u31-linux-x64.tar.gz
# mv jdk1.8.0_31 jdk

Now propagete /opt/jdk to all the slaves

# scp -r jdk hslave1:/opt
# scp -r jdk hslave2:/opt
# scp -r jdk hslave3:/opt
  1. On each node, let’s use the alternatives tool to set up Oracle Java as the default Java framework.
# alternatives --install /usr/bin/java java /opt/jdk/bin/java 2
# alternatives --config java # select appropriate program (/opt/jdk/bin/java)
# alternatives --install /usr/bin/jar jar /opt/jdk/bin/jar 2
# alternatives --install /usr/bin/javac javac /opt/jdk/bin/javac 2
# alternatives --set jar /opt/jdk/bin/jar
# alternatives --set javac /opt/jdk/bin/javac 

Check if everything is OK by executing java -version.

  1. Set up environmental variables:
# nano /etc/bashrc

Add the following:

export JAVA_HOME=/opt/jdk
export JRE_HOME=/opt/jdk/jre
export PATH=$PATH:/opt/jdk/bin:/opt/jdk/jre/bin

And also possibly:

alias ll='ls -l --color'
alias cp='cp -i'
alias mv='mv -i'
alias rm='rm -i'

Check if everyting is OK:

# source /etc/bashrc
# echo $JAVA_HOME

Installing and configuring hadoop 2.6.0

On master:

# cd /opt
# wget
# tar -zxf hadoop-2.6.0.tar.gz
# rm hadoop-2.6.0.tar.gz
# mv hadoop-2.6.0 hadoop

Propagate /opt/hadoop to slave nodes:

# scp -r hadoop hslave1:/opt
# scp -r hadoop hslave2:/opt
# scp -r hadoop hslave3:/opt

Add the following lines to /home/hadoop/.bashrc on all the nodes (you may play with scp for that too):

export HADOOP_PREFIX=/opt/hadoop

Edit /opt/hadoop/etc/hadoop/core-site.xml – set up NameNode URI on every node:


Create HDFS DataNode data dirs on every node and change ownership of /opt/hadoop:

# chown hadoop /opt/hadoop/ -R
# chgrp hadoop /opt/hadoop/ -R
# mkdir /home/hadoop/datanode
# chown hadoop /home/hadoop/datanode/
# chgrp hadoop /home/hadoop/datanode/    

Edit /opt/hadoop/etc/hadoop/hdfs-site.xml – set up DataNodes:


Create HDFS NameNode data dirs on master:

# mkdir /home/hadoop/namenode
# chown hadoop /home/hadoop/namenode/
# chgrp hadoop /home/hadoop/namenode/    

Edit /opt/hadoop/etc/hadoop/hdfs-site.xml on master. Add further properties:


Edit /opt/hadoop/etc/hadoop/mapred-site.xml on master.

   <value>yarn</value> <!-- and not local (!) -->

Edit /opt/hadoop/etc/hadoop/yarn-site.xml – setup ResourceManager and NodeManagers:

        <value>hmaster</value> <!-- or hslave1, hslave2, hslave3 -->

Edit /opt/hadoop/etc/hadoop/slaves on master (so that master may start all necessary services on slaves automagically):


Now the important step: disable firewall and IPv6 (Hadoop does not support IPv6 – problems with listening on all the interfaces via

# systemctl stop firewalld

Add the following lines to /etc/sysctl.conf:

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1

Format NameNode:

# su hadoop
$ hdfs namenode -format

Start HDFS (as user hadoop):


Check out with jps if DataNode are running on slaves and if DataNode, NameNode, and SecondaryNameNode are running on master. Also try accessing http://hmaster:50070/

Start YARN on master:


Now NodeManagers should be alive (jps) on all nodes and a ResourceManager on master too.

We see that the master node consists of a ResourceManager, NodeManager (YARN), NameNode and DataNode (HDFS). A slave node acts as both a NodeManager and a DataNode.

Testing hadoop 2.6.0

You may want to check out if you are able to copy a local file to HDFS and run the standalone Hadoop Hello World (i.e. wordcount) Job.

$ hdfs dfsadmin -safemode leave # ??????
$ hdfs dfs -mkdir /input
$ hdfs dfs -copyFromLocal test.txt /input
$ hdfs dfs -cat /input/test.txt | head
$ hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input/test.txt /output1

If anything went wrong, check out /opt/hadoop/log/*.log. Good luck :)

