Don’t Throw Your Code Over The Wall: 5 Ways To Work With Ops Engineers

Filed in Cloud Industry Insights by Garrett Heath | November 26, 2013 11:30 am

While the tech world is shifting to more of a DevOps structure, in many places the invisible wall that separates developers and operations engineers is still pretty high. Communication between both teams is necessary to make sure the application is deployed correctly and achieves high availability for end users. Here are some suggestions from different Rackers (Rackspace employees) on how to best handoff your app to the ops team instead of just throwing the code over the wall.

The Devil is in the Documentation Details

Documenting a well-defined list of dependencies is one of the most important things you can do when handing off your application. Rather than being open-ended with what the app needs to run properly, it is important to be specific. Let’s say you’ve developed an app that relies on MongoDB for storing and retrieving data.

“Saying that your app needs MongoDB leaves many things to interpretation. For instance, what is your scalability strategy? Should this application need to be sharded? Do you want replica sets? What version do you want installed? What are your indexing needs?” Rackspace Deployment Services[1] engineer BK Box says. “If your app requires a MySQL backend, do you want Master-Master replication or Master-Slave? How much memory and disk space do you need? These are a couple examples of the types of details operations engineers need to have. Be sure to keep the communication open to talk about requirements needed to get the application running.”

Set Up Descriptive Logging

Having detailed logs is one way to help operations engineers quickly diagnose and fix a problem when an alert comes in. “Looking at the application logs is one of the first ways you can tell if a developer came from an ops background,” says Farid Saad, Cloud Servers[2] engineer. Saad says that ops engineers have to quickly determine what is going on if an application begins behaving strangely, and having to constantly debug logs that are not meaningful can cause frustration and delays in getting the app back online. Overcome this issue by ensuring that your logs adequately describe the errors that the app is encountering.

Fully Tested Code

Major Hayden[3], the Rackspace Chief Security Architect, advises that developers fully test the code to help mitigate any surprises on launch day. Hayden emphasizes that the testing should extend beyond the typical unit testing that happens in most QE processes. “When testing, it is important to understand what happens when you put all the pieces together,” Hayden says. “With integration testing, you should make sure that the code works well with other dependent applications, such as an authentication system.” Be certain to verify the integration points between your code and other applications.

Have a Rollback Plan

Everyone expects code deployments to go successfully, however, you have to make the assumption that something could go wrong. “We see it all the time where a new piece of code is rolled out and the application starts behaving poorly,” says Kenny Gorman, co-founder of ObjectRocket, a MongoDB solution by Rackspace[4]. “Many times it could be a simple missing database index, but other times the new application could actually be logically corrupting the database. In these cases it’s great to have a rollback plan, which not only mitigates the risk of a bad deployment but also minimizes the amount of downtime if something goes awry.”

Communicate Availability Expectations

Rackspace Cloud Servers engineer Richard Maynard wants to know what the developer’s expectations are for the app. “How often should it be available? How often will it be updated? What is the anticipated usage?” Maynard asked. “To me, it is about requirements for availability and understanding what the operations team can do to meet those requirements.”

There is a difference between 99.99 percent and 99.999 percent uptime. While it may not sound like a lot, the amount of 9s can affect how you architect the infrastructure for the app (in the above example, it is the difference between 87 hours 36 minutes of downtime and 8 hours 45 minutes of downtime). “What it comes down to is an understanding of the cost versus complexity and the impact of downtime to the business,” Maynard says. “Are you developing a payment gateway and could lose money out the window if it goes down, or is it a non-critical app that could simply upset a handful of users?” Communicating this expectation to the ops team can help ensure that the hardware is architected to achieve the required level of uptime.

While industries are beginning to tear down the wall between operations and developers, it would be helpful to consider these tips until finally arriving at a true DevOps structure.

Looking to host your newest app? Rackspace offers performance cloud hosting with all SSD drives so you can get the most out of your app in addition to ObjectRocket[5], a Database as a Service solution to run MongoDB on custom, fine-tuned hardware.

The Rackspace DevOps Automation Service[6] automates application environments using DevOps tools, and includes 24×7 DevOps Engineering support.

Endnotes:
  1. Rackspace Deployment Services: http://www.rackspace.com/application-deployment/
  2. Cloud Servers: http://www.rackspace.com/cloud/servers/
  3. Major Hayden: http://www.rackspace.com/blog/author/mhayden/
  4. ObjectRocket, a MongoDB solution by Rackspace: http://objectrocket.com/
  5. ObjectRocket: http://objectrocket.com
  6. DevOps Automation Service: http://rackspace.com/devops

Source URL: http://www.rackspace.com/blog/dont-throw-your-code-over-the-wall-5-ways-to-work-with-ops-engineers/