Friday, November 25, 2011

Linux: Splitting a Large File into Small Files

Recently I was trying to transfer a large ISO file across a horribly unstable VPN. The transfer would fail at various amounts of transfer percentages. So, I thought I'd best split the file up into 10MB chunks, then rsync those over and stitch it back together.

That way if it failed 90% of the way through, I wouldn't have to resend all the data, just that last 10%. The way I managed to do this, was to:

split --bytes=10m file.iso file_part 

What happens now is, you have a bunch of 10MB files called


So, just rsync all those files to the destination:

rsync -e ssh -a --progress file_part*

When that completes, login to the remote host and put them back together:

cat file_part* > orig_file.iso

Done and done.

Tuesday, April 5, 2011

Gluster Setup in the Cloud: Simple, Easy

Here, I'll spin up two instances and configure distributed, replicated storage between them. I'll do two instances but I've actually done it for up to six instances. I have no idea how far it would scale but I'm guessing 30 or more would work fine using this method.

First I fired up two CentOS 5 instances which will be talking to each other over the private network. I'll name them node-1 and node-2 and add entries for both to /etc/hosts, like so:             node-1             node-2

They are in the same security group, so they can see each other. After you add the entries, ping the other side, like this:

[root@node-1 ~]# ping node-2
PING node-2 ( 56(84) bytes of data.
64 bytes from node-2 ( icmp_seq=1 ttl=64 time=0.140 ms
64 bytes from node-2 ( icmp_seq=2 ttl=64 time=0.138 ms

--- node-2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.138/0.139/0.140/0.001 ms

Next, grab and install the software:

rpm -Uvh gluster*.rpm
rm gluster*.rpm

Then, load the fuse module:
modprobe fuse

Then, start glusterd
/etc/init.d/glusterd start

Pick some directories to use, in this case, gluster will use /export/queue-data and you and your apps will use /queue. So, don't every access files in /export/queue-data, gluster owns that directory:

mkdir /queue /export/queue-data

Setup the clients so they can see/talk to each other, run this on each system. From node-1:
[root@node-1 ~]# gluster peer probe node-2
Probe successful

[root@node-2 ~]# gluster peer probe node-1
Probe successful

Next, create your directories on both systems:
[root@node-1 ~]# mkdir -p /queue /export/queue-data/
[root@node-2 ~]# mkdir -p /queue /export/queue-data/

Now, create your volume:
[root@node-1 ~]# gluster volume create queue-data replica 2 node-1:/export/queue-data node-2:/export/queue-data
Creation of volume queue-data has been successful. Please start the volume to access data.

Ready to start the volume export:
[root@node-1 ~]# gluster volume start queue-data
Starting volume queue-data has been successful

To manually mount the volume, run:
[root@node-1 ~]# mount -t glusterfs /queue

You can see that it's been mounted here:
[root@node-1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             2.0G  1.5G  445M  77% /
/dev/sda2             7.9G  152M  7.4G   2% /opt
none                  256M     0  256M   0% /dev/shm
                      2.0G  1.5G  445M  77% /queue

To mount it automatically on boot, run:
[root@node-2 ~]# echo "    /queue     glusterfs defaults,_netdev 0 0" >> /etc/fstab
[root@node-2 ~]# mount -a

If you're doing something different and want to be able to run VM's off your glusterfs, add this to fstab:          /queue                   glusterfs direct-io-mode=disable,_netdev 0 0

And you can see that on node-2 it's also been mounted:
[root@node-2 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             2.0G  1.5G  445M  77% /
/dev/sda2             7.9G  152M  7.4G   2% /opt
none                  256M     0  256M   0% /dev/shm
                      2.0G  1.5G  445M  77% /queue

Now, let's make sure it works. I'll create a file on node-1 and then make sure it exists on node-2:

[root@node-1 ~]# cd /queue && dd if=/dev/zero of=output.dat bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.572936 seconds, 18.3 MB/s

And, here I see it on node-2:
[root@node-2 ~]# ls -al /queue
total 10276
drwxr-xr-x  2 root root     4096 Apr  5 12:49 .
drwxr-xr-x 25 root root     4096 Apr  5 12:43 ..
-rw-r--r--  1 root root 10485760 Apr  5 12:49 output.dat

And, there you have it. If you require iptables rules, which I don't because I'm already behind ec2's ACL's, add something like this to node-1 and changing the IP for node-2:
# /etc/sysconfig/iptables
-A INPUT -m state --state NEW -p tcp --dport 24007 --source -j ACCEPT
-A INPUT -m state --state NEW -p tcp --dport 24008 --source -j ACCEPT
-A INPUT -m state --state NEW -p tcp --dport 24009 --source -j ACCEPT
-A INPUT -m state --state NEW -p tcp --dport 24010 --source -j ACCEPT

You could also just do something like this:
A INPUT -m state --state NEW -p tcp --dport 24007:24010 --source -j ACCEPT

Sunday, January 9, 2011

Ruby and Rails and MySQL and Apache and Passenger and Git: Install on CentOS 5

My New Years resolution was to learn ruby on rails, the popular web application framework. It just so happens I've been assigned a ruby project at work. Go figure..

I've been watching some screen-cast stuff my buddy at Pivotal lent me and the first thing to do was to install the stuff. Looks like it's pretty much dominated by people that develop on their Mac, then push apps to some public Rails hosting company. That's fine but I have a hankerin to install in my own CentOS 5 environment. It was actually a lot more tinkering than it should be, which is weird. So, I figured I'd better lay it out so I don't forget how it's done. Hopefully this'll help others. The latest version of ruby comes with ruby gems, the ruby package manager. So you don't need to install that separately.

Something that is interesting to note, Passenger is really mod_ruby - an apache module. They don't call it mod_ruby but that's what it is. Similar in theory to mod_perl and mod_php.

You'll get the latest stable ruby as of this writing (ruby-1.9.2-p136) + rails 3.0.3.

Another note, I'm using MySQL as my back end database but to do default 'bundle installs', you need sqlite so I'm installing that as well.


# Install deps
# Install the latest version of Ruby and Rails and Git to run in Apache + Passenger.
# Includes MySQL and sqlite.

yum -y install httpd zlib-devel curl-devel openssl-devel httpd-devel apr-devel apr-util-devel

tar -xvf ruby-1.9.2-p136.tar.gz
cd ruby-1.9.2-p136
./configure --enable-shared --enable-pthread
make && make install
cd ext/zlib
ruby extconf.rb --with-zlib-include=/usr/include --with-zlib-lib=/usr/lib64
cd ../..
make install
ruby --version

# Installing sqlite
tar -zxf sqlite-autoconf-3070400.tar.gz
cd sqlite-autoconf-3070400
./configure && make && make install

gem install rails
gem install passenger
gem install mysql -- --with-mysql-conf=/usr/bin/mysql --with-mysql-lib=/usr/lib64/mysql

tar xvf git*.*
cd git*
make prefix=/usr/local all
make prefix=/usr/local install

# run this by hand because you have to answer some questions.
# passenger-install-apache2-module

Next, you'll want to configure a hosting environment with a custom http config:

vi /etc/httpd/conf.d/rails_app.conf

<VirtualHost *:80>
    DocumentRoot /opt/rails/myapp/public
    <Directory /opt/rails/myapp/public>
        Allow from all
        Options -MultiViews

Tuesday, January 4, 2011

Creating a Virtual Private Cluster with OpenVPN

I'd like to change gears and look at using the cloud in a bit of a new way for me. That is, as a direct extension of the LAN. This post represents a bunch of research on my part. It's what I spent the 2010 Christmas holiday working on. If you get bored with all the tech stuff here, make sure you read the last couple of paragraphs in this post so you can see what the end-goal really is. Overwhelming possibilities is all I can say. It's more than the title implies - so keep an open mind after it's all setup and working.

In this post, I'll create a new security group and connect it to my office network via a VPN tunnel. Then, I'll route packets to instances running in that security group through the tunnel, just as if it were on my local office LAN. You just need a single port open to the security group to create the tunnel. I'll also mention that you can spend a lot of money and pay for something similar, however by rolling your own you have way more flexibility. It's an investment to understand how it all works together to be sure but once I got it down I feel like I'm on another, higher level of understanding networking. For me it was a really great project.

So, let's get started. First, I'm going to create a new security group and call it vpc for virtual private cloud:

ec2-add-group VPC -d "Virtual Private Cloud"
GROUP VPC Virtual Private Cloud

Now, I'm going to authorize two ports for my group, 1194 for the tunnel and 22000 for sshd while I get everything configured. Keep in mind I usually run sshd on a port other than 22 because of annoying scan-bots. I'll de-authorize port 22000 when I'm up and running.

# ec2auth -Ptcp -p22000 VPC
# ec2auth -Pudp -p1194 VPC

Now, I'll spin up an instance, install openvpn and configure the service. The following instance is an AMI I created that doesn't have OpenVPN installed.

# ec2run -k joeyssh -g VPC pmi-182a79e7

Ok, I'm going to start another instance in that VPC so we can test end-to-end connectivity once we get the tunnel setup. Same command as above.

Now I can check the IP's of my instances and login.

# ec2din

Ok, now I'm logged into one of the instances I'm going to rename openvpn-server because it's going to be my OpenVPN server system.

# hostname openvpn-server

Next, I'm going to add the RPMForge repos to this system and install it. The nice thing about doing it this way is that you'll solve all of OpenVPN's dependencies at the same time. They are stuff like openssl-devel, lzo-devel, pam-devel, etc.

So, install the RPMForge RPM for your architecture:

# rpm -Uhv

Then, install OpenVPN:

# yum -y install openvpn

Now that the software is installed, let's configure the OpenVPN server. We're going to create a set of certificates for authenticating connections. We're going to create 3 certs. One is the certificate authority, or CA. This will be used to sign both the server certificate and the client certificate. Next, I'll create the server and client certs that will be used to authenticate and encrypt the tunnel. In order to do this, I'll use the 'EasyRSA' software that comes with openvpn. This is specific to CentOS, your distro's methods may vary.

Copy easy-rsa into /etc/openvpn/
# rsync -a /usr/share/doc/openvpn-2.1.4/easy-rsa/2.0/ /etc/openvpn/easy-rsa/

Before we create our certificates and keys, I'm going to edit the vars file which contains the default values for my certificates. They're at the very bottom of the file, look for KEY_COUNTRY through KEY_EMAIL and customize them for your environment:

# vi /etc/openvpn/easy-rsa/vars

This is a bit of a hack but for some reason all the shell scripts that easy-rsa provides aren't marked as executable. So, to fix that I ran:

# cd /etc/openvpn/easy-rsa/
# file * | perl -lane 'system "chmod 755 $1" if (/(.*?):.*?Bourne.*?/)'

Next, source the vars file and build the Certificate Authority certificate:

# cd /etc/openpvn/easy-rsa/
# . vars
# ./clean-all
# ./build-ca

Ok, next we'll create the openpvn server's certificate:

# ./build-key-server openvpn-server

Next, we build the Diffie-Hellman key:

# ./build-dh

And finally, we'll build our client certificate which will be installed on a PC at the office to setup the tunnel.

# ./build-key openvpn-client

Now, we'll configure the openvpn server. Before doing so, we need to know two things. One, the network information for my office LAN and the network information for the AWS LAN. In my case, they are:


That plays into my config file, notice the customizations, "push route" and "route". Note also that for the tunnel itself I'm using You can just leave that alone, or change it. It doesn't really matter what you use there. It's less confusing to use something that doesn't overlap with either your office or AWS networks though. Create a file (/etc/openvpn/vpc.conf) and add the following:

# OpenVPN Server Config
port 1194
proto udp
dev tun

# Certificates
ca /etc/openvpn/easy-rsa/keys/ca.crt
cert /etc/openvpn/easy-rsa/keys/openvpn-server.crt
key /etc/openvpn/easy-rsa/keys/openvpn-server.key # This file should be kept secret
dh /etc/openvpn/easy-rsa/keys/dh1024.pem

push "route"
log-append  /var/log/openvpn.vpc.log
verb           3
status         /etc/openvpn/vpc_status.log
keepalive 20 100

push "dhcp-option DNS"
client-config-dir /etc/openvpn/ccd

user nobody
group nobody

We haven't configured OpenVPN to use pre-shared (aka static) keys because that's less secure. Static keys are bad because if a system with the shared key is lost or stolen, the shared key must be regenerated and replaced on all systems running OpenVPN. We're running asymmetric (or two-way) encryption to ensure the identity of the VPN partner. The way that works, is the client and server both have a public and private key. They trade public keys when the connection starts and start encrypting traffic for the partner with the public key. They decrypt traffic with the private key. Only the recipient's private can decrypt data encoded by his public key. If a system is lost or stolen an administrator can simply revoke the certificates (on openvpn-server) that belong to the client in question.

In our configuration we're using asymmetric public key encryption to establish a session, then OpenVPN negotiates a static key between hosts for tunnel encryption. These expire on a regular basis and are regenerated. I see things in my logs like:

TLS: tls_process, killed expiring key

I think this is because if there's a man-in-the-middle type of attack, by the time they decrypt that static key and can actually eavesdrop on the session, the key has expired and been regenerated. By default, the time is 60 seconds - which seems good to me? :)

Now, before we start the service, it's important that both systems have accurate clocks. If the time is off by more than 5 minutes on either side, the tunnel goes away and doesn't come back until the clocks are back in sync. The easiest way to manage that is just run ntpd:

yum -y install ntp
chkconfig ntpd on
service ntpd start

Now, let's fire up OpenVPN:

# chkconfig openvpn on
# service openvpn start

You should now see openvpn running as user nobody and listening on port 1194:

# lsof -i:1194
openvpn 1448 nobody    4u  IPv4   4866       UDP *:openvpn 

If you have any trouble starting the service, look in: /var/log/openvpn.vpc.log for errors.

Ok, so that's the server side config. Now, we need an OpenVPN client on your LAN to create our tunnel. So, pick a box on the network and install OpenVPN. In my case, I'm going to use a Xen instance on an existing infrastructure system but you can use a desktop system or an old PC or whatever.

If you're running CentOS x86_64, use the instructions above to install RPMForge and OpenVPN.

I've called this system 'openvpn-client' and installed the software I need. So now, I need to copy down the keys I generated for this host from the OpenVPN Server in the cloud. To do this, simply:

# mkdir /etc/openvpn/keys/
# cd /etc/openvpn/keys/
# scp -P 22000 -i ~/.ssh/mykey root@publicipofopenvpnserver:/etc/openvpn/easy-rsa/keys/openvpn-client.crt . 
# scp -P 22000 -i ~/.ssh/mykey root@publicipofopenvpnserver:/etc/openvpn/easy-rsa/keys/openvpn-client.key .
# scp -P 22000 -i ~/.ssh/mykey root@publicipofopenvpnserver:/etc/openvpn/easy-rsa/keys/ca.crt .

Ok, now we need the OpenVPN client configuration. The changes to this file you'll need to make are the name/location of the certificates and the PUBLIC IP of your OpenVPN server. Create a file on the openvpn-client system (/etc/openvpn/vpc.conf) and add the following:

dev tun
proto udp

remote 1194
resolv-retry 10
ca   /etc/openvpn/keys/ca.crt
cert /etc/openvpn/keys/openvpn-client.crt
key  /etc/openvpn/keys/openvpn-client.key

verb           3
status-version 2
log-append  /var/log/openvpn.log

mute 10

# Enable Compression

Now, fire up the service on openvpn-client, which will create the tunnel:

# chkconfig openvpn on
# service openvpn start

On the client, you should see that a new virtual tun interface has been created, tun0:

# ip addr show tun0
4: tun0: -POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP- mtu 1500 qdisc pfifo_fast qlen 100
    inet peer scope global tun0

So, we've established a connection to the VPN server, which is great. That's the magic part, it makes the systems in the cloud only one hop away through the virtual network:

# traceroute
traceroute to (, 30 hops max, 40 byte packets
 1 (  110.496 ms  111.075 ms  111.100 ms

Now, you'll also notice that I have a route, automagically added by the openvpn software that makes the cloud LAN, local (in a sense). In this example, the cloud systems have LAN addresses of and I have a route for that network now, through my new tunnel:

# route -n | grep ^10 UG    0      0        0 tun0

So, in theory, I should be able to ping the eth0 interface of my cloud instance, which you can see below:

# ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=202 ms
64 bytes from icmp_seq=2 ttl=64 time=131 ms

And of course I can now access that instance directly from my LAN. No need to hit the big bad internet:

ssh -l root -i ~/.ssh/yoursshkey root@ -p 22000

That's a beautiful thing! Now, I can turn off (or revoke) the ACL that allows ssh access into port 22000 from my security group:

ec2-revoke VPC -Ptcp -p22000

We're still not done yet, we need to turn on packet forwarding on both openpvn-server and openvpn-client. Simply do this on both systems:

# echo 1 > /proc/sys/net/ipv4/ip_forward

Now, you're going to need to add a route, on your office gateway system. That route is going to tell all the hosts on your network that if they want to get to the AWS LAN network, they need to go through the openvpn-client system. In my case, it's something like:

# ip route add via

That's because my gateway is a Linux box but all the DSL/Cable routers have the ability to add routes. Just figure out how to do that on your gateway router.

In addition, you're going to need to add routes to each instance in AWS that tells each system how to get to your office LAN. So in my case I have two instances currently running in my VPC security group. The openvpn-server and a subversion server, called svn. So, I log into the svn server and run:

# ip route add via

Where is the eth0 IP of my openvpn-server system. So now, I can ping my SVN server from my LAN. Alternatively, you could make your office the default route for all the AWS systems which may make sense in some situations.

That's pretty much scratching the surface here..

Now, what's really pretty cool about this setup, is that if you had say a San Francisco office, you could build out another openvpn-client system in SF and route packets between sites, securely through your cloud instances. Not only that but you could just make the cloud, your default route for both office networks and install Snort or whatever IDS you want and maybe a firewall distro on an instance if you want.. how about a PBX like trixbox? Total control over your networks. This concept really offers overwhelming possibilities. An office SMB share mounted on an EBS drive for example - basically you can move your IT infrastructure to the cloud.

Another possibility is that because OpenVPN supports HTTP and SOCKS proxies, you could tunnel a network from your office, through the HTTP proxy and have the public AWS systems be local to your LAN there by making those systems totally 'greenside', this is if you worked for say, some telecommunications giant with a tightly controlled internal network, for example.

If you were to run OpenVPN in bridge mode (using the tap interface) you could even route broadcast traffic and even IPX and non-IP packets to the other locations. And, there's more, traffic shaping isn't only possible it's built in to OpenVPN. In addition, if you loose your IP on the DSL office line and re-up to your Internet provider, the tunnel is re-established so quickly that none of your SSH sessions will die, they'll simply freeze up for some seconds while one of the partners gets a new IP, and then continue working normally.

Ok, also now that I've showed you how to do all this by hand, there are some dedicated firewall/gateway distro's, like Shorewall that I haven't used but that have built in support for OpenVPN, so the client config and routing stuff are done in a WebGUI. Might be a fun experiment at some point.

Then of course you can have your telecommuters or road warriors connect to OpenVPN from the road and have access to the corporate network. Again, the possibilities are overwhelming.

I'll have to do a post on getting Snort going, that would be really fun. As usual, post questions and or problems, thanks for stopping by!