In the technology world, it is always crucial to keep the data highly available to ensure it is accessible to the application/user. High availability is achieved here by distributing the data across the multiple volumes/nodes.
Client machines/users can access the storage as like local storage. Whenever the user creates data on the Gluster storage, the data will be mirrored/distributed to other storage nodes.
Article will continue after the ad
What is GlusterFS?
GlusterFS is an open-source, scalable network filesystem suitable for high data-intensive workloads such as media streaming, cloud storage, and CDN (Content Delivery Network). GlusterFS was developed initially by Gluster Inc, and then by Redhat, as a result of the acquisition.
Below are the important terminologies we use throughout this article.
Brick – is basic storage (directory) on a server in the trusted storage pool.
Volume – is a logical collection of bricks.
Cluster – is a group of linked computers, working together as a single computer.
Distributed File System – A filesystem in which the data is spread across the multiple storage nodes and allows the clients to access it over a network.
Client – is a machine which mounts the volume.
Server – is a machine where the actual file system is hosted in which the data will be stored.
Replicate – Making multiple copies of data to achieve high redundancy.
Fuse – is a loadable kernel module that lets non-privileged users create their own file systems without editing kernel code.
glusterd – is a daemon that runs on all servers in the trusted storage pool.
RAID – Redundant Array of Inexpensive Disks (RAID) is a technology that provides increased storage reliability through redundancy.
As said earlier, the volume is the collection of bricks, and most of the gluster operations such as reading and writing happen on the volume. GlusterFS supports different types of volumes based on the requirements; suitable for scaling the storage size or improving the performance or for both.
In this article, we will configure a replicated GlusterFS volume on CentOS 7 / RHEL 7.
Replicated Glusterfs Volume is like a RAID 1, and volume maintains exact copies of the data on all bricks. You can decide the number of replicas while creating the volume, so you would need to have atleast two bricks to create a volume with two replicas or three bricks to create a volume of 3 replicas.
Why not read about remaining types of GlusterFS volumes.
Here, we are going to configure GlusterFS volume with two replicas. Make sure you have two 64bit systems (either virtual or physical) with 1GB of memory, and one spare hard disk on each system.
|Host Name||IP Address||OS||Memory||Disk||Purpose|
|gluster1.itzgeek.local||192.168.12.16||CentOS 7||1GB||/dev/sdb (5GB)||Storage Node 1|
|gluster2.itzgeek.local||192.168.12.17||RHEL 7||1GB||/dev/sdb (5GB)||Storage Node 2|
|client.itzgeek.local||192.168.12.8||Ubuntu 16.04||NA||NA||Client Machine|
GlusterFS components use DNS for name resolutions, so configure either DNS or set up a hosts entry. If you do not have a DNS on your environment, modify /etc/hosts file and update it accordingly.
sudo vi /etc/hosts 192.168.12.16 gluster1.itzgeek.local gluster1 192.168.12.17 gluster2.itzgeek.local gluster2 192.168.12.20 client.itzgeek.local client
Add GlusterFS Repository:
Before proceeding to the installation, we need to configure GlusterFS repository on both storage nodes. Follow the instruction to add the repository to your system.
Add Gluster repository on RHEL 7.
vi /etc/yum.repos.d/Gluster.repo [gluster38] name=Gluster 3.8 baseurl=http://mirror.centos.org/centos/7/storage/$basearch/gluster-3.8/ gpgcheck=0 enabled=1
Install centos-release-gluster package, it provides you the required YUM repository files. This RPM is available from CentOS Extras.
yum install -y centos-release-gluster
Once you have added the repository on your systems, we are good to go for the installation of GlusterFS. Install GlusterFS package using the following command.
yum install -y glusterfs-server
Start the glusterd service on all gluster nodes.
systemctl start glusterd
Verify that the glusterfs service is running fine.
[root@gluster1 ~]# systemctl status glusterd ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled; vendor preset: disabled) Active: active (running) since Tue 2016-09-27 16:00:19 EDT; 1s ago Process: 4072 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 4073 (glusterd) CGroup: /system.slice/glusterd.service └─4073 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO Sep 27 16:00:19 gluster1.itzgeek.local systemd: Starting GlusterFS, a clustered file-system server... Sep 27 16:00:19 gluster1.itzgeek.local systemd: Started GlusterFS, a clustered file-system server.
Enable glusterd to start automatically on system boot.
systemctl enable glusterd
You would need to either disable the firewall or configure the firewall to allow all connections within a cluster.
# Disable FirewallD systemctl stop firewalld systemctl disable firewalld OR # Run below command on a node in which you want to accept all traffics comming from the source ip firewall-cmd --zone=public --add-rich-rule='rule family="ipv4" source address="<ipaddress>" accept' firewall-cmd --reload
Assuming that you have one spare hard disk on your machine, /dev/sdb is the one I will use here for a brick. Create a single partition on the spare hard disk shown like below.
You would need to perform the below steps on both nodes.
Format the created partition with the filesystem of your choice.
Mount the disk on a directory called /data/gluster.
mkdir -p /data/gluster mount /dev/sdb1 /data/gluster
Add an entry to /etc/fstab for keeping the mount persistent across reboot.
echo "/dev/sdb1 /data/gluster ext4 defaults 0 0" | tee --append /etc/fstab