In fact this is not the first GlusterFS installation in our cluster. /G has installed it a couple times ago, but It was not used. Now I have to reactivate it and rearrange the settings in the better way. Now I am forced to do it, just because the P5 system is not responding up to now.
The situation is, we have nodes with rather big storages each. These storages were not used optimally before. GlusterFS seems to be ideal, since it can gather all the free spaces for all nodes. I configured the harddisks using LVM last time, so the file system size spans in two storages. The total capacity is now 265 per node. Actually we can have around 3.5Gigs, but I decided to make it redundant.
Preparing GlusterFS
GlusterFS can be downloaded from this link. We used version 1.3.7. This file system uses FUSE to be mounted on client. So FUSE must also be installed prior to GlusterFS. The patched version can be found here.
Do a normal installation, configure, make and make install to those downloaded source codes. Please refer to their installation instruction. Try to solve all dependencies. If you still have trouble and have no idea to install them, maybe you need to find another job :D.
Configuring GlusterFS
I have, in front my eyes, 14 nodes to configure. The first node is used as the head node. So it is excluded from the main storage brick. Instead its free space is combined with node-12 for a scratch space. The nodes are enumerated from 00 to 12, so when redundant storage is preferred, only node-00 to node-11 can be used.
All nodes, except node-12, share their space in the same way. node-00 and node01 are selected as the head storage. A head node has a special shared space, called the name-space. So basically these two nodes are the same with others. all nodes mount the shared partition below /gls directory. The fstab entry is as follow:
/dev/scratch-vol/scr /gls ext3 defaults 0 0
The volume configurations are stored inside the directory /etc/glusterfs. Followings are the content of these files:
1) server with name-space (node-00 & node-01), gfs-server-ns.vol
volume wapblock
type storage/posix
option directory /gls/block
end-volume
volume wapblock-ns
type storage/posix
option directory /gls/block-ns
end-volume
volume server
type protocol/server
option transport-type tcp/server
subvolumes wapblock wapblock-ns
option auth.ip.wapblock.allow *
option auth.ip.wapblock-ns.allow *
end-volume
2) server without name-space (node-02 to node-11), gfs-server.vol
volume wapblock
type storage/posix
option directory /gls/block
end-volume
volume server
type protocol/server
option transport-type tcp/server
subvolumes wapblock
option auth.ip.wapblock.allow *
end-volume
3) scratch server (node-head & node-12), gfs-server-scr.vol
volume wapblock
type storage/posix
option directory /gls/block
end-volume
volume wapblock-ns
type storage/posix
option directory /gls/block-ns
end-volume
volume server
type protocol/server
option transport-type tcp/server
subvolumes wapblock wapblock-ns
option auth.ip.wapblock.allow 192.168.100.*
option auth.ip.wapblock-ns.allow 192.168.100.*
end-volume
I set the allowed address here explicitly, because head node is world accessible. I would think to do the same for others, but maybe later. Those are the required configuration files for brick/block servers. The followings are the client configuration files.
1) main working space, gfs-client.vol
volume gls-node00
type protocol/client
option transport-type tcp/client
option remote-host node-00
option remote-subvolume wapblock
end-volume
#..... the same code blocks take place here, except the volume name and remote-host name
#..... for node-01 to node-11
# now namespaces. Two of them are needed.
volume gls-node00-ns
type protocol/client
option transport-type tcp/client
option remote-host node-00
option remote-subvolume wapblock-ns
end-volume
volume gls-node01-ns
type protocol/client
option transport-type tcp/client
option remote-host node-01
option remote-subvolume wapblock-ns
end-volume
# Automatic file replication
volume mirror01
type cluster/afr
option replicate *:2
subvolumes gls-node00 gls-node01
end-volume
# the same code blocks continue.
# Pairing: gls-node02 - gls-node03 .... until gls-node10 - gls-node11
# namespace mirror
volume mirror-ns
type cluster/afr
subvolumes gls-node00-ns gls-node01-ns
end-volume
# combine all
volume unify
type cluster/unify
option namespace mirror-ns
option scheduler rr
option rr.limits.min-free-disk 5%
option rr.refresh-interval 10
subvolumes mirror01 mirror02 mirror03 mirror04 mirror05 mirror06
end-volume
# performance
volume readahead
type performance/read-ahead
option page-size 256 # 256KB is the default option
option page-count 4 # 2 is default option
option force-atime-update off # default is off
subvolumes unify
end-volume
volume writebehind
type performance/write-behind
option aggregate-size 1MB # default is 0bytes
option flush-behind on # default is 'off'
subvolumes unify
end-volume
2) scratch space, scratch.vol
Scratch space uses no redundant blocks.
volume scr-node1
type protocol/client
option transport-type tcp/client
option remote-host node-head
option remote-subvolume wapblock
end-volume
volume scr-node2
type protocol/client
option transport-type tcp/client
option remote-host node-12
option remote-subvolume wapblock
end-volume
volume scr-ns
type protocol/client
option transport-type tcp/client
option remote-host node-head
option remote-subvolume wapblock-ns
end-volume
volume unify
type cluster/unify
option scheduler rr
option namespace scr-ns
subvolumes scr-node1 scr-node2
end-volume
After all the configuration files are prepared, the files are copied to the nodes. I used rgang to do the copy. I prepared first files below /etc/clusternodes for rgang, according to the definitions mentioned above: glshead, glsserv, glsscr, and world.
$ rgang -c glshead gfs-server-ns.vol /etc/glusterfs/gfs-server.vol
$ rgang -c glsserv gfs-server.vol /etc/glusterfs/gfs-server.vol
$ rgang -c glsscr gfs-server-scr.vol /etc/glusterfs/gfs-server.vol
$ rgang -c world gfs-client.vol /etc/glusterfs
$ rgang -c world scratch.vol /etc/glusterfs
Now we need the script to activate them. I wrote the following script, and put it below /etc/init.d. All the nodes have the same script, glus-load.
#!/bin/bash
# Loading GlusterFS
# Rosandi 2008
#loading glusterfs daemon @ server
load-srv() {
glusterfsd -f /etc/glusterfs/gfs-server.vol || echo failed >> /var/log/glus-srv.log
}
#mounting client
load-cl() {
echo "/usrshome"
glusterfs -f /etc/glusterfs/gfs-client.vol /usershome
echo "/scr"
glusterfs -f /etc/glusterfs/scratch.vol /scr
}
unload-cl() {
umount /usershome
umount /scr
}
case "$1" in
start)
echo "Loading GlusterFS daemon: "
load-srv
sleep 5
echo "mounting GlusterFS: "
load-cl
;;
stop)
unload-cl
killall glusterfsd
;;
restart)
$0 stop
$0 start
;;
*)
echo "Usage: glus-load {start|stop|restart}"
exit 1
esac
exit 0
Now since everything is ready, we only need rgang again to activate them.
$ rgang world /etc/init.d/glus-load start
I would suggest to activate the new filesystem only from the head node, after checking the availability of all required node. This can take place in rc.local. Of course in Debian system.
I read also that someone has provided a Debian repository for GlusterFS. So things will became less complicated.
So this looks like the end of our _imperfect_ NFS server. Unless if I can fix our AIX station. To be honest, I don't like AIX. Its only expensive. I am thinking of installing Linux in it, maybe, share the filesystem also with GlusterFS?
Now look what we have after mounting the new filesystem,
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 133G 6.8G 119G 6% /
tmpfs 3.9G 0 3.9G 0% /lib/init/rw
udev 10M 44K 10M 1% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/sdb1 147G 188M 140G 1% /gls
glusterfs 1.6T 1.1G 1.5T 1% /usershome
glusterfs 411G 376M 390G 1% /scr