How To Install Apache Hadoop on CentOS 7, Ubuntu 18.04 & Debian 9

Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed process, and it handles the very large size of data sets by distributing it across computer clusters.

Rather than rely on hardware high availability, Hadoop modules are designed to detect and handle the failure at the application layer, so gives you high-available service.

Hadoop framework consists of following modules,

  • Hadoop Common – It contains a common set of libraries and utilities that support other Hadoop modules
  • Hadoop Distributed File System (HDFS) – is a Java-based distributed file-system that stores data, providing very high-throughput to the application.
  • Hadoop YARN – It manages resources on compute clusters and using them for scheduling user’s applications.
  • Hadoop MapReduce – is a framework for large-scale data processing.

This guide will help you to install Apache Hadoop on CentOS 7, Ubuntu 18.04 & Debian 9. This guide should also work on Ubuntu 16.04.


Switch to the root user.

su -


sudo su -

Install Java

Apache Hadoop requires Java version 8 only. So, you can choose to install either OpenJDK or Oracle JDK.

READ: How To Install Oracle Java on CentOS 7 / RHEL 7

READ: How To Install Oracle Java on Ubuntu 18.04

READ: How To Install Oracle Java on Debian 9

Here, for this demo, I will be installing OpenJDK 8.

### CentOS 7 / RHEL 7 ###

# yum -y install java-1.8.0-openjdk wget

### Ubuntu 18.04 / 16.04 & Debian 9 ###

# apt update 
# apt install -y openjdk-8-jdk wget

Check the Java version.

# java -version


openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-b04)
OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)

Create Hadoop user & Enable Passwordless Authentication

It is recommended to create a regular user to configure and run Apache Hadoop. So, create a user named hadoop and set a password.

# useradd -m -d /home/hadoop -s /bin/bash hadoop

# passwd hadoop

Once you created a user, configure passwordless ssh to the local system. Create an ssh key using the following commands.

# su - hadoop

$ ssh-keygen

$ cat ~/.ssh/ >> ~/.ssh/authorized_keys

$ chmod 600 ~/.ssh/authorized_keys

Verify the passwordless communication to the local system. If you are doing ssh for the first time, type yes to add RSA keys to known hosts.

$ ssh

Install Apache Hadoop

Download Hadoop

You can visit Apache Hadoop page to download the latest Hadoop package, or you can issue the following command in terminal to download Hadoop v3.2.0.

$ wget

$ tar -zxvf hadoop-3.2.0.tar.gz 

$ mv hadoop-3.2.0 hadoop

Install Hadoop

Hadoop supports three modes of clusters

  1. Local (Standalone) Mode – It runs as a single java process.
  2. Pseudo-Distributed Mode – Each Hadoop daemon runs in a separate process.
  3. Fully Distributed Mode – It is an actual multinode cluster ranging from few nodes to extremely large cluster.

Setup environmental variables

Here, we will be configuring Hadoop in Pseudo-Distributed mode. To start with, set environmental variables in the ~/.bashrc file.

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-   ## Change it according to your system
export HADOOP_HOME=/home/hadoop/hadoop   ## Change it according to your system

Apply environmental variables to the current session.

$ source ~/.bashrc

Modify Configuration files

Edit the Hadoop environmental file.

vi $HADOOP_HOME/etc/hadoop/

Set JAVA_HOME environment variable.

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-

Hadoop has many configuration files, and we need to edit them depends on the cluster modes we set up (Pseudo-Distributed).

$ cd $HADOOP_HOME/etc/hadoop

Edit core-site.xml.


Edit hdfs-site.xml.


Create NameNode and DataNode directories in hadoop user’s home directory.

 mkdir -p ~/hadoopdata/hdfs/{namenode,datanode}

Edit mapred-site.xml.


Edit yarn-site.xml.


Now format the NameNode using the following command. Do not forget to check the storage directory.

$ hdfs namenode -format


.    .    .
.    .    .

2019-05-12 06:38:42,066 INFO common.Storage: Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted.
2019-05-12 06:38:42,169 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hadoop/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
2019-05-12 06:38:42,483 INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .
2019-05-12 06:38:42,501 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2019-05-12 06:38:42,516 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at server.itzgeek.local/


Allow Apache Hadoop through the firewall. Run the below commands as the root user.


firewall-cmd --permanent --add-port=9870/tcp
firewall-cmd --permanent --add-port=8088/tcp
firewall-cmd --reload


ufw allow 9870/tcp
ufw allow 8088/tcp
ufw reload

Start the NameNode daemon and DataNode daemon by using the scripts in the /sbin directory, provided by Hadoop.



Starting namenodes on [server.itzgeek.local]
server.itzgeek.local: Warning: Permanently added 'server.itzgeek.local,' (ECDSA) to the list of known hosts.
Starting datanodes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Starting secondary namenodes [server.itzgeek.local]
2019-05-12 06:39:14,171 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Open your web browser and go to the below URL to browse the NameNode.
Install Apache Hadoop on CentOS 7 – Hadoop NameNode Information

Start ResourceManager daemon and NodeManager daemon.



Starting resourcemanager
Starting nodemanagers

Open your web browser and go to the below URL to access the ResourceManager.
Install Apache Hadoop on CentOS 7 – Resource Manager

Test Hadoop Single Node Cluster

Before carrying out the upload, let us create a directory at HDFS.

$ hdfs dfs -mkdir /raj

Let us upload a file into HDFS directory called raj.

$ hdfs dfs -put ~/.bashrc /raj

Uploaded files can be viewed by going to NameNode >> Utilities >> Browse the file system in NameNode.
Install Apache Hadoop on CentOS 7 – Hadoop File System

Copy the files from HDFS to your local file systems.

$ hdfs dfs -get /raj /tmp/

You can delete the files and directories using the following commands.

hdfs dfs -rm -f /raj/.bashrc
hdfs dfs -rmdir /raj


That’s All. You have successfully configured a single node Hadoop cluster and tested the hadoop filesystem. Please share your feedback in the comments section.

centos 7debian 9hadooprhel 7ubuntu 16.04ubuntu 18.04
Comments (1)
Add Comment
  • Alex He

    Hi, I followed your instruction, however, namenode cannot be started when i try to format namenode. (I installed jdk1.8.0_40).
    The error message is showing as:
    15/04/11 16:32:13 FATAL namenode.NameNode: Fialed to start namenode
    java.lang.IllegalArgumentException: URI has an authority component
    at org.apache.hadoop.hdfs.server.namenode.NNSStorage.getStorageDirectory(
    org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals( 270)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.format(
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(
    15/04/11 16:32:13 INFO util.ExitUtil: Exiting with status 1
    15/04/11 16:32:14 INFO namenode.NameNode: SHUTDOWN_MSG:
    SHUTDOWN_MSG: Shutting down NameNode at ThinkPad-Edge-E540/
    Could you advise what’s wrong here? Thanks.