How to Prepare Your Team to Leverage Cloud Operating Models

by Keiran Holloway, Rackspace Elastic Engineering+ Head of Service Delivery, Rackspace Technology, And Sriram Rajan, Senior Principal Architect, Rackspace Technology

Meeting

 

Cloud operating models are essential elements in a successful cloud environment.  At Rackspace Technology®, where we have managed cloud environments for many years, we often conclude that  they can stand between you and your enterprise’s ability to successfully adopt a public cloud. This is one of the areas of IT that can be difficult to define and contextualize. In this post, we aim to explain cloud operating models, as well as the features and keys to building a successful cloud operating model.

Let’s start with the features of a successful operating model. They include:

  • Democratizing technology via cloud native and bringing it closer to the end users
  • Gaining agility and the ability to make changes fast
  • Eliminating duplication while enabling economies of scale
  • Establishing the lines of demarcation between various functions within the cloud environment
  • Providing communications paths between various functions (adding more functions is excellent, but there is a trade-off between specialization and efficiency)

Contextualizing an effective operating model

You can break down the average cloud stack of an enterprise into three layers: applications, shared services and infrastructure.

  1. Application layer: This is where your application or business logic lives. It’s effectively the platform on which your end users consume your services. 
  2. Shared services layer: This comprises various components, depending on the technologies adopted by your company. Technologies that commonly sit in the shared service layer include your container orchestration mechanisms (e.g., Kubernetes® clusters), shared messaging buses (e.g., Apache Kafka or Google Cloud Pub/Sub), Cloud IAM patterns and networking constructs like network peering and landing zones.
  3. Infrastructure layer: While there should be a bias toward using shared service platforms where possible, there will be occasions where bespoke deployments are needed, such as running a marketplace product within a Google Compute Engine instance. Infrastructure that sits outside this shared services layer will commonly fall into the infrastructure layer.

Within each of these three layers, there are two functions:

  1. Building and deploying: The layers should be producing standard deployment patterns. For example, using standard CI/CD pipelines and deploying applications.
  1. Ongoing operations: This is known as the “feeding and watering” of infrastructure to help ensure that it is operating in an effective, secure and efficient fashion.

Here is a visual depiction of how these functions could be structured.

As shown in the gray boxes above, an operating model accounts six separate functions within your cloud environment. Depending on your organization, you might need six different teams (one for each function), or use a smaller number of teams with overlapping functions. For example, the team that builds applications could also be responsible for watering and feeding the application. This could include, for example, 24x7x365 monitoring of application availability and responsibility for code updates and release application changes. Alternatively, you could split these actions into separate teams and functions.

Google Cloud offers interesting constructs that help build the above functions using key infrastructure services. For example, you can create a shared network (virtual private cloud) that’s managed by your shared operations team but used by your application operations team. Google Cloud projects enable the operations teams to centralize monitoring, group runbooks and notification processes in one place, then manage them effectively. You can also employ a library of Google deployments using Terraform infrastructure as code as the foundation for custom deployments.

Keys to building a cloud operating model:

  • Assemble the right skills: Ensure you have all the functions covered in the image above and your team is properly skilled. An anti-pattern that we have seen is having infrastructure teams on call for the application. But that rarely yields outcomes much better than simply restarting the application. This means the root cause is never particularly well understood.
  • Fill skill gaps: Document the makeup of each team. It is common to see gaps within the operating model when walking through the actual teams and individuals who fulfill each function. Identifying and filling any gaps is a quick way to improve your cloud operating approach.
  • Align functions: Establish boundaries between these functions and the responsibilities of team members within each group.
  • Streamline teamwork: Understand and record how the teams interact. Frequently, we see different teams using different IT service management (ITSM) tools, languages or terminology. This can create chaotic interactions and should be avoided.
  • Embrace cloud native: For example, go serverless to eliminate any heavy lifting on your end.  Even if you cannot go fully serverless, embrace containers using Google Kubernetes Engine or standard Google Compute engine patterns can help.

The creation of cloud operating models can be a daunting task. Defining one’s own processes and organizational structure is often harder than anyone expects. Then there are the numerous cloud service providers from which to choose. Will you self-manage, try DIY or bring in hired hands to help guide the way? There is no ideal universal approach. IT environments are complex, business requirements rapidly shift, and few of us have guaranteed budgets. Trust us, take your time, do your research — and please don’t just move your existing problems to the cloud.

Prepare your team for the cloud