McGraw Hill Supports the Education Journeys of Millions Around the World
Michael Bordash, Principal Cloud Practice Architect, Rackspace Technology
This leader in education delivers world-class services via secure cloud native applications designed to aid its future expansion.
Customer Overview
McGraw Hill's legacy AMI process supported multiple base operating systems, including various Linux distributions and Microsoft Windows versions. Development teams would build application-specific requirements on top of these base AMIs, contributing to the undifferentiated administrative burden and introduced multiple standards that made governance more complex. Rackspace Elastic Engineering began managing many application-specific AMIs and recommended a solution to automate and standardize the build process. The component-based model of AWS EC2 Image Builder was a perfect opportunity to minimize duplication while supporting the development of an automated pipeline for all AMI builds.
Problem Statement
Creating, maintaining, and governing customized images to meet organizational policies can be both labor-intensive and unsustainable at scale. An optimal solution must be highly automated, resilient, and integrated into pre-existing workflows to empower business growth and innovation.
Background
Security and compliance are paramount for McGraw Hill. While their current workflow already incorporated many industry-standard security best practices, they knew adopting a cloud-native solution would improve operational excellence and developer experience. The solution Rackspace Elastic Engineering delivered reduced the complexity of securing and managing multiple third-party components by introducing workflow orchestration between internal teams and third-party tools and eliminated unnecessary or redundant handoffs.
McGraw Hill's legacy AMI process supported multiple base operating systems, including various Linux distributions and Microsoft Windows versions. Development teams would build application-specific requirements on top of these base AMIs, contributing to the undifferentiated administrative burden and introduced multiple standards that made governance more complex. Rackspace Elastic Engineering began managing many application-specific AMIs and recommended a solution to automate and standardize the build process. The component-based model of AWS EC2 Image Builder was a perfect opportunity to minimize duplication while supporting the development of an automated pipeline for all AMI builds.
Watch the case study video here.
Figure 1 describes the legacy build process. Note the multiple actors involved and entry points.
Solution Overview
The Rackspace Elastic Engineering team determined that replacing the core AMI build processes with EC2 Image Builder would be the ideal platform to standardize building custom AMIs for McGraw Hill. Once the AMI is built by Image Builder, AWS Step Functions help orchestrate the remaining steps of the end-to-end process. By leveraging Image Builder’s integration with AWS Organizations, the distribution and governance of AMIs is easily managed.
Figure 2 provides an overview of the EC2 Image Builder based solution.
The Rackspace Elastic Engineering solution has some key improvements, specifically
- Decoupled architecture supporting asynchronous steps and retry mechanisms.
- Robust governance and security controls, including encryption enforcement.
- Automated security scanning integration and codified approvals workflow following principles of least privilege.
- Reusable components to support various architectures and versions of operating systems.
- Automated ChatOps notifications with a custom Slack app supporting user interactions.
Watch a Technical Deep Dive here.
Process Deep Dive
The AWS EC2 Image Builder pipelines are triggered via CloudWatch Events. Each AMI build uses a dedicated pipeline allowing for complete control over the frequency of builds. All build artifacts are sourced from AWS S3 and JFrog Artifactory. Package repository mirrors are configured in Artifactory improve visibility, governance, and reduce network latency of project dependencies. JFrog Xray is used to scan all artifacts stored within Artifactory. The AWS EC2 Image Builder pipelines and components are all managed via Terraform and stored within GitHub.
Once a build is completed, AWS Step Functions are leveraged to facilitate the remainder of the build process. AWS Simple Queue Service (SQS) is used to decouple workflow activities to ensure the build process does not fail due to timeout from a long-running or asynchronous step. This was necessary due to a 24-hour execution limitation on AWS EC2 Image Builder pipelines. For example, AMI approval incorporates a manual and a third-party service response. When a candidate AMI is ready for scanning, an EC2 instance is launched, and a custom lambda interacts with the Rapid7 REST API to initiate a review of the resource. This design minimizes the burden on the security team and provides an audit trail of approval activity within Slack.
Figure 3 demonstrates an example of a Slack notification with interactive links.
The custom Slack application relies upon an AWS Lambda function, fronted by an AWS Application Load Balancer (ALB). The ALB uses OpenID Connector (OIDC) integration with McGraw Hill's enterprise Identity Provider to ensure authentication and authorization controls are met.
Figure 4 shows the mechanisms involved when a security engineer interacts with the Slack message.
Once approved, the distribution process is executed via AWS Organizations integration. All AMIs are recorded via AWS DynamoDB, providing a centralized mechanism to govern images throughout their lifecycle. After an AMI is approved and distributed, there needs to be a mechanism to revoke AMIs that are no longer safe and prevent obsolete AMIs from being launched.
A custom AWS Lambda function compares the list of currently shared AMIs with the DynamoDB tracking table, and if an AMI is no longer approved for use it is automatically unshared from the AWS organization, preventing future EC2 launches from using the AMI.
The last step of the process is distribution notification. The step function that handles this component publishes an event to an AWS EventBridge event bus for child accounts, which will listen for an AMI distribution event. This enables and empowers each application team to create event rules that can trigger automatic AMI rotation for their infrastructure.
Figure 5 shows an example build notification triggering an application deployment pipeline to rotate an AMI.
Additional security measures
McGraw Hill has a security mandate to encrypt all volumes at rest. The solution supports full end-to-end encryption of all Elastic Block Store (EBS) volumes. All AMIs are encrypted using a dedicated Key Management Service (KMS) customer-managed key that is trusted by the AWS Organization. This allows each account to decrypt the AMI and re-encrypt with account-specific keys that are managed by the application teams. When using service-linked roles, such as AWS EC2 Auto Scaling, an additional step is required to complete the permissions grant. A CloudFormation custom resource was developed to fulfill this requirement, enabling AWS Auto Scaling’s service role to launch an encrypted AMI. This was critical since McGraw Hill relies heavily on auto-scaling to provide a scalable and resilient platform.
Figure 6 shows the end-to-end encryption support for service-linked roles.
Figure 7 provides an overview of the KMS Grant mechanism.
A dedicated account was provisioned for all EC2 Image Builder operations following AWS Organizations best practices, and simplified implementation of identity and network controls. Network access control lists (NACLs) were developed to filter ingress and egress network traffic not necessary for building an AMI. AWS Service Control Policies (SCPs) were implemented to enforce use of sanctioned AMIs throughout the environment. Finally, a custom Lambda function detects and remediates EC2 instances that are no longer in use, thereby reducing potential risk of unmanaged devices on network as well as increasing overall efficiency of resource utilization.
Outcome
Rackspace Technology transformed and modernized McGraw Hill's custom AMI build process by creating a cloud-native solution that supports end-to-end automation, robust security controls and reusable components to support future expansion. Enabling McGraw Hill to focus on delivering world class applications and educational services rather than manage their custom AMI build process is just one of the ways that the Rackspace Elastic Engineering team supports and creates value for our clients.
Recent Posts
Escalando el acantilado de la ingeniería de plataformas: Un viaje hacia la innovación y la eficiencia
Julio 11th, 2024
Cómo funcionan las estrategias de fragmentación: Párrafo, oración y técnicas inteligentes
Julio 4th, 2024
Arquitectura basada en células en AWS
Mayo 6th, 2024
Una guía esencial para liberar el poder de la AI generativa
Julio 6th, 2023