First I fired up two CentOS 5 instances which will be talking to each other over the private network. I'll name them node-1 and node-2 and add entries for both to /etc/hosts, like so:
10.19.128.5 node-1 10.19.128.6 node-2
They are in the same security group, so they can see each other. After you add the entries, ping the other side, like this:
[root@node-1 ~]# ping node-2 PING node-2 (10.19.128.6) 56(84) bytes of data. 64 bytes from node-2 (10.19.128.6): icmp_seq=1 ttl=64 time=0.140 ms 64 bytes from node-2 (10.19.128.6): icmp_seq=2 ttl=64 time=0.138 ms --- node-2 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.138/0.139/0.140/0.001 ms
Next, grab and install the software:
wget http://download.gluster.com/pub/gluster/glusterfs/LATEST/RHEL/glusterfs-core-3.1.3-1.x86_64.rpm wget http://download.gluster.com/pub/gluster/glusterfs/LATEST/RHEL/glusterfs-fuse-3.1.3-1.x86_64.rpm rpm -Uvh gluster*.rpm rm gluster*.rpm
Then, load the fuse module:
modprobe fuse
Then, start glusterd
/etc/init.d/glusterd start
Pick some directories to use, in this case, gluster will use /export/queue-data and you and your apps will use /queue. So, don't every access files in /export/queue-data, gluster owns that directory:
mkdir /queue /export/queue-data
Setup the clients so they can see/talk to each other, run this on each system. From node-1:
[root@node-1 ~]# gluster peer probe node-2 Probe successful
[root@node-2 ~]# gluster peer probe node-1 Probe successful
Next, create your directories on both systems:
[root@node-1 ~]# mkdir -p /queue /export/queue-data/ [root@node-2 ~]# mkdir -p /queue /export/queue-data/
Now, create your volume:
[root@node-1 ~]# gluster volume create queue-data replica 2 node-1:/export/queue-data node-2:/export/queue-data Creation of volume queue-data has been successful. Please start the volume to access data.
Ready to start the volume export:
[root@node-1 ~]# gluster volume start queue-data Starting volume queue-data has been successful
To manually mount the volume, run:
[root@node-1 ~]# mount -t glusterfs 127.0.0.1:queue-data /queue
You can see that it's been mounted here:
[root@node-1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 2.0G 1.5G 445M 77% / /dev/sda2 7.9G 152M 7.4G 2% /opt none 256M 0 256M 0% /dev/shm glusterfs#127.0.0.1:queue-data 2.0G 1.5G 445M 77% /queue
To mount it automatically on boot, run:
[root@node-2 ~]# echo "127.0.0.1:queue-data /queue glusterfs defaults,_netdev 0 0" >> /etc/fstab [root@node-2 ~]# mount -a
If you're doing something different and want to be able to run VM's off your glusterfs, add this to fstab:
127.0.0.1:queue-data /queue glusterfs direct-io-mode=disable,_netdev 0 0
And you can see that on node-2 it's also been mounted:
[root@node-2 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 2.0G 1.5G 445M 77% / /dev/sda2 7.9G 152M 7.4G 2% /opt none 256M 0 256M 0% /dev/shm glusterfs#127.0.0.1:queue-data 2.0G 1.5G 445M 77% /queue
Now, let's make sure it works. I'll create a file on node-1 and then make sure it exists on node-2:
[root@node-1 ~]# cd /queue && dd if=/dev/zero of=output.dat bs=1M count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.572936 seconds, 18.3 MB/s
And, here I see it on node-2:
[root@node-2 ~]# ls -al /queue total 10276 drwxr-xr-x 2 root root 4096 Apr 5 12:49 . drwxr-xr-x 25 root root 4096 Apr 5 12:43 .. -rw-r--r-- 1 root root 10485760 Apr 5 12:49 output.dat
And, there you have it. If you require iptables rules, which I don't because I'm already behind ec2's ACL's, add something like this to node-1 and changing the IP for node-2:
# /etc/sysconfig/iptables -A INPUT -m state --state NEW -p tcp --dport 24007 --source 10.19.128.6 -j ACCEPT -A INPUT -m state --state NEW -p tcp --dport 24008 --source 10.19.128.6 -j ACCEPT -A INPUT -m state --state NEW -p tcp --dport 24009 --source 10.19.128.6 -j ACCEPT -A INPUT -m state --state NEW -p tcp --dport 24010 --source 10.19.128.6 -j ACCEPT
You could also just do something like this:
A INPUT -m state --state NEW -p tcp --dport 24007:24010 --source 10.19.128.6 -j ACCEPT
Thanks for a great writeup. I have a question related to your article, perhaps you could help?
ReplyDeleteI need to scale up a couple apache servers but share the docroot using a synchonized (common source). I am however a little unclear how to "link" /var/www to the glusterfs client mount point (e.g. /queue). After googling, there seem to be a few options, but not sure about "best practices".
1) Could one simply make the glusterfs client mount point /var/www.
Not sure it is that simple?
2) Would one use symblic links between /var/www and the glusterfs client mount point? Not exactly sure how this would be done. Am not a linux systems administrator.
3) Would one use mount --bind between /var/www and the glusterfs client mount point? Again, not exactly sure how this would be done.
4) Or perhaps there is another best practice?
Any help would be appreciated, since I am a bit of a newbie in this area.
Good article and it really works. I have a small question. I am wondering whether you have considered other distributed file systems such hadoop, luster, mogilefs,xtreefs. Which one is easy to manage and scale for replicated volume setup? I know everyone has its own merits and demerits still what is your opinion on that?
ReplyDelete