This article was written and contributed by Adrian Cole, Senior Technical Evangelist at Opscode, Rackspace Cloud Tools Partner. Adrian is also founder of jclouds.org, a popular open source cloud framework for the Java community.
I've spent most of my recent life in cloud APIs. While common belief may suggest otherwise, clouds generally don't come with a "make my IT problems disappear" button. There are a lot of tools that help you launch a predefined image of an operating system. However, stopping there leaves you with what I'd call Cloud Toys, all the drawbacks of unmanaged servers on a cool API Luckily, we don't have to stop there. Read through this and you'll see how you can make some serious cloud server action.
IaaS provisioning APIs such as Amazon EC2, GoGrid, vCloud, and Rackspace Cloud Servers are totally sweet. They take stress away from you: the hows of provisioning nodes and scalability concerns of doing that en mass. They use HTTP, and that is still awesome: it works over proxies and in really restricted environments such as devices or Google App Engine. Integration is easier. Software packages and policies frozen in an operating system image guarantee you a baseline state, but there's more to it than that. How are changes managed? How do I load in my users, firewall rules, etc? While it is possible to create an image for each change, that sounds too much like work. Moreover, image construction is very hard to do portably across clouds. Say you did. Next comes head scratching about connecting the dots (Integration). Which pair of EBS volumes go with this node? Should this machine become a slave or a master? What's the monitoring url? Without a holistic process, you're cloud servers end up like a tricycle on a major highway with a really fancy phone. Cloud servers want to be more, and you don't want to be stuck in vm sprawl suck.
Infrastructure as Code’s goal is to “Enable the reconstruction of the business from nothing but a source code repository, an application data backup, and bare-metal resources”. Read the book, it’s good, but here’s a hint. Provisioning is step one in at least three stages: Configuration, and Integration follow. Moving into infrastructure as code moves you from cloud toys to cloud servers. Chef and the Opscode platform do this.
Chef defines a repository for cookbooks, roles, and other metadata that define your infrastructure. Version control this repository and upload it to the Chef server or the Opscode Platform. Now, you can completely rebuild any system, or your entire infrastructure, easily, and at scale. You can even run ad-hoc queries to find what nodes are running what configuration, or who is the master! As GI Joe taught us, knowing is half the battle, right? Ok, so here's how to connect your cloud provisioning process to Chef's configuration and integration process. To integrate with Chef, a node needs a few things.
- Ruby and the Chef client libraries
- A run-list of roles to become at startup. Ex. hadoop-slave
- A key and url of a chef server. Ex. https://api.opscode.com/organizations/acme
Push these things onto a node, run chef, and voila! Your really cool ubuntu cloud node is now a defined and managed piece of your infrastructure! Technically running the above is easy.. just use knife. Knife is our commandline tool and uses a really cool cloud api called fog to run cloud servers. However, I’ll show you how to fish on this: Let’s recap: Cloud provisioning API gives you a node, and now you need to upload some files and run the chef client command. One way approach is: wait until the node is up, SCP the files, and use ssh to run the chef client. Pretty standard really… but… there are a couple drawbacks. Needing to SSH in implies that
- You have to “wait around” and poll for ssh to be ready, handling all the error conditions, remembering passwords, etc.
- SSH must be running (generally on a public ip address), and you must gain access to root through it
- You have an SSH client or library handy
- You can open TCP sockets that SSH uses across the network
- Your ability to scale is no longer limited to the cloud: It is limited by your SSH process. All this work is needed for every node in the potential thousand you are firing up.
Wouldn't you rather dodge this, and push all of that work to the provisioning cloud? For some clouds, you can't, as there's no means to place files on disk outside of SSH. However, the Rackspace Cloud Servers API has a wonderful feature that makes this possible: file injection. While the file upload problem is solved, but there's a constraint. The Cloud Servers API requires that all files are read only, and not executable. How do we run the chef client if we cannot make a file executable? I took this constraint challenge to the #chef-hacking irc channel for an extra shot in creativity. Dan "The Awesome Cron Master" DeLeo aka types: try cron I don't know about you, but I totally missed that cron can do stuff at reboot time. Here's the super sweet cron line that connects the provisioning system to the Opscode Platform:
@reboot (bash /etc/install-chef && /usr/bin/chef-client -j /etc/chef/first-boot.json && rm /var/spool/cron/crontabs/root)> /var/log/chef.out 2> /var/log/chef.err
Using this cron entry (or knife), you can do everything via HTTP, which works so nicely in my iPad and behind proxies and firewalls. You also get to lean on Rackspace to scale out the connection of provisioning to configuration/integration.