7 Critical AWS Architecture Risks: How to Assess and Remediate Security Gaps

By Louis Cadier, Managed Public Cloud Specialist, Rackspace Technology

Overview

This blog article takes a look at common AWS security and operational gaps and how in-house remediation compares to MSP solutions. 

Your AWS environment is only as strong as the architecture behind it. From hundreds of production environments analyzed by our Rackspace Elastic Engineering team, seven recurring risk patterns stand out. Each represents a gap between AWS best practices and the way environments are actually implemented across industries.

In this article, you’ll see each risk broken down into three parts: the technical problem, what your internal team can do to address it, and how a managed service provider can accelerate resolution. The goal is to give you a practical framework for recognizing and closing the security gaps that matter most.

And because these risks are widespread and often urgent, Rackspace Technology is offering limited-time incentives to help organizations address them quickly. Details are included at the end of this post.

1. Untagged IAM resources:

The technical problem

IAM resources without proper tagging create blind spots in your AWS environment. When users, roles and policies lack consistent metadata, your team loses the ability to track ownership, allocate costs and enforce security boundaries. What looks like a small administrative gap quickly cascades into operational challenges.

The root issue comes from AWS’s flexible IAM model. By default, users can create resources without tags, leading to thousands of items with no clear owner or purpose. Tags are critical for attribute-based access control (ABAC), cost allocation and governance. Without them, you can’t effectively enforce access policies or maintain visibility into resource usage.

Security and operational risks

Untagged IAM resources open multiple paths for security and operational failure. Without proper tags, your team cannot apply ABAC consistently, leaving overly permissive access in place. Budget tracking and cost optimization become impossible. And during a security incident, you can’t quickly identify resource ownership, slowing response times and amplifying breach impact.

In-house resolution approach

Most internal teams start with manual auditing and custom automation. That includes identifying untagged resources through IAM APIs, building a tagging taxonomy and using Service Control Policies (SCPs) and tagging policies to prevent new untagged resources. Teams often create custom scripts to retroactively tag existing resources and add tagging requirements into CI/CD pipelines.

This in-house approach to tagging demands dedicated time, specialized skills and constant oversight. The hardest part is retroactive tagging, especially in large environments where ownership structures are complex.

Managed service provider solution

Managed service providers bring pre-built automation and established governance frameworks. They define comprehensive tagging policies at the organizational unit level, apply automated remediation through AWS Config rules and deliver cost allocation reporting via AWS Cost and Usage Reports.

The managed approach typically starts with emergency triage to tag critical resources, followed by automated enforcement and ongoing governance. Providers use proven templates and methodologies to accelerate implementation while minimizing business disruption. With this model, organization-wide tagging can usually be completed in weeks rather than months, with built-in compliance monitoring and automated drift detection to keep governance intact over time.

2. Disabled secret rotation:

The technical problem

AWS Secrets Manager without automated rotation leaves a critical gap in your security posture. When secrets remain unchanged for long periods, the likelihood of compromise rises. Many teams store essential credentials in Secrets Manager but don’t enable rotation, leaving databases, APIs and services exposed to static credentials that may persist for months or even years.

The challenge lies in coordinating rotations across distributed systems. Applications must handle credential updates gracefully, and database connections must survive the transition from old to new credentials without service interruption.

Security and operational risks

Static credentials are a persistent attack vector. Regular rotation reduces the blast radius of a breach by limiting the window of exposure. It also supports compliance with security frameworks such as SOC 2, ISO 27001 and PCI DSS. Without rotation, you face compliance findings, larger incident impact and potential credential sprawl as teams work around inflexible static keys.

In-house resolution approach

Implementing secret rotation internally requires building Lambda functions for each secret type, configuring database-specific rotation logic and adding application-side retry mechanisms to handle rotation windows. For secrets that don’t support managed rotation, you need custom functions to update both the secret and the target service.

Your team must design schedules that minimize application impact, run extensive testing for rotation failures and build monitoring to detect issues quickly. The complexity multiplies with each service type because each one uses different authentication methods and has unique failure modes. Many internal teams struggle with testing, particularly achieving seamless transitions and handling edge cases such as timeouts or concurrent access.

Managed service provider solution

Managed providers offer standardized rotation frameworks that support multiple service types out of the box. They design rotation schedules around best practices, implement blue-green testing strategies and provide 24x7x365 monitoring for rotation health.

The managed approach includes pre-built Lambda functions for common services, automated rollback for failed rotations and monitoring integration to detect problems before they affect applications. Providers also manage the complexity of rotation windows, scheduling updates during low-traffic periods and coordinating with your applications to avoid disruption.

3. CloudTrail not enabled:

The technical problem
AWS CloudTrail records actions taken by users, roles or AWS services as events, providing auditing, governance and compliance visibility. When CloudTrail is disabled or misconfigured, you lose visibility into who performed what actions across your AWS environment.

The complexity lies in building the right architecture. You need to configure multi-region trails, set S3 bucket policies for secure log storage and implement log analysis systems that can process high volumes of API activity. For comprehensive coverage, CloudTrail should log events across all AWS regions.
Security and operational risks

Without CloudTrail, your team has limited forensic capability during security incidents. Many compliance frameworks — including PCI, FedRAMP and HIPAA — expect centralized activity logging, and auditors often look for CloudTrail as the mechanism. Without it, you cannot reliably track unauthorized access, investigate configuration changes or determine the scope of a potential breach.

Operationally, the absence of CloudTrail makes troubleshooting harder, leaves no change history for audits and slows incident response due to missing logs.

In-house resolution approach

Implementing CloudTrail internally means designing a multi-region trail architecture, configuring S3 bucket policies with encryption and access controls, and building log analysis systems with services like Amazon Athena or third-party SIEM platforms.

Best practice is to create dedicated S3 buckets for CloudTrail logs with KMS encryption, set retention policies, configure alerts for unusual activity and integrate CloudTrail data into your security workflows. The challenge is scale: CloudTrail generates massive log volumes, and it takes specialized expertise to build effective analysis that separates real threats from routine activity.

Managed service provider solution

Managed providers deliver enterprise-grade CloudTrail configurations with optimized log analysis pipelines. They deploy preconfigured monitoring rules, enable automated threat detection with machine learning and provide expert review of security events.

The managed approach includes full trail setup across regions and accounts, SIEM integration and 24x7x365 security operations monitoring. Providers also handle log retention optimization, cost management for large-scale environments and automated compliance reporting.

4. Direct IAM users instead of roles:

The technical problem

AWS recommends federated access with temporary credentials instead of direct IAM users with permanent credentials. Relying on IAM users introduces security risks because permanent access keys never expire and cannot be centrally revoked.

The technical challenge is shifting from user-based to role-based access while keeping business operations running. IAM users hold long-term credentials, while IAM roles issue temporary credentials that rotate automatically.

Security and operational risks

Direct IAM users create permanent credential exposure. Keys can end up in code repositories, get shared between team members or remain active even after employees leave. Without federated access, it’s harder to enforce multi-factor authentication and apply conditional access policies at scale.

Operational risks include complex access management, limited ability to implement just-in-time access and credential rotation procedures that are often overlooked.

In-house resolution approach

Migrating to federated access internally requires integrating an identity provider, designing role-based access patterns and moving existing users to federated identities. Federation can be established with SAML 2.0 or OpenID Connect tied to your existing identity provider.

Your team must audit current IAM users, map their permissions to roles, implement the identity federation infrastructure and migrate users gradually to avoid service disruption. This process requires close coordination with application teams to update how services authenticate to AWS. Internal teams often struggle with mapping existing permissions to new role structures and managing the transition period when direct users and federated access coexist.

Managed service provider solution

Managed providers deploy federated access using proven patterns and established integrations with leading identity providers. They design role hierarchies around your organizational structure, automate user lifecycle management and deliver smooth migrations from direct IAM users.

The managed approach includes comprehensive identity mapping, just-in-time access implementation and integration with your identity management systems. Providers take on the complexity of permission mapping and deliver zero-downtime migration through phased rollout strategies.

5. Unencrypted SNS topics:

The technical problem

SNS topics without encryption at rest expose sensitive data and create compliance risks. While AWS provides encryption options, many teams leave the default configuration in place, which disables server-side encryption.

With server-side encryption, SNS uses AWS KMS keys to encrypt messages as soon as they’re received, storing them in encrypted form and decrypting only when delivered to subscribers. Without this protection, messages containing PII, financial data or health records are stored in clear text at rest and remain vulnerable to exposure.

Security and operational risks

Unencrypted SNS topics create multiple compliance challenges. If you operate in regulated markets such as HIPAA, PCI DSS or FedRAMP, you’re expected to encrypt message data at rest and enforce secure transport in transit. Without it, sensitive application data is exposed to potential interception and audit findings are almost guaranteed.

In-house resolution approach

Encrypting SNS topics internally requires auditing all existing topics, enabling server-side encryption with the right KMS keys and updating applications to support encrypted messaging. You also need to add the aws:SecureTransport condition to IAM policies to enforce HTTPS connections.

Your team must test every dependent application to confirm compatibility with encrypted topics and watch for performance impacts. Many teams underestimate the complexity of this testing, especially validating that downstream systems can handle encrypted messages without disruption.

Managed service provider solution

Managed providers deliver comprehensive SNS encryption strategies with enterprise-grade key management. They audit all existing topics, deploy encryption with minimal service disruption and verify that all integration points continue working as expected.

The managed approach includes automated encryption rollouts, full testing of message flows and optimized KMS key policies that balance performance with security. Providers also implement monitoring so that encryption settings remain consistent and compliant over time.

6. Missing AMI build automation:

The technical problem

Relying on manual AMI creation creates operational and security challenges. Manual processes are time-consuming, error-prone and require frequent re-creation and re-snapshotting of images. Without automated pipelines, your team cannot maintain consistent patching, security hardening or configuration management across the EC2 fleet.

EC2 Image Builder offers an automation framework for creating secure AMIs with built-in testing and validation, but many organizations still lack systematic image management processes.

Security and operational risks

Manual AMI processes lead to configuration drift, inconsistent security patching and longer vulnerability exposure windows. When incidents occur, you cannot quickly rebuild affected systems because image creation requires manual intervention and lacks repeatability.

Operationally, manual image management slows deployment cycles, raises the risk of human error and makes it difficult to apply consistent security hardening across infrastructure.

In-house resolution approach

Implementing automated builds internally requires creating EC2 Image Builder pipelines with the right components, testing phases and distribution strategies. You can configure dependency updates to automatically rebuild images when base AMIs or components change.

This implementation requires designing image recipes for different workload types, building automated testing components, setting up distribution across multiple regions and accounts, and integrating with CI/CD pipelines. Teams must also manage cascading pipelines for workload-specific images built from standardized base images.

Internal teams often struggle with the complexity of testing automation and verifying that new builds don’t introduce regressions into production workloads.

Managed service provider solution

Managed providers deliver comprehensive image management strategies using proven EC2 Image Builder configurations. They create standardized base images with built-in security hardening, implement automated testing suites and design distribution strategies that support disaster recovery requirements.

The managed approach includes preconfigured security baselines, automated vulnerability scanning and testing frameworks that validate both functionality and security compliance before image deployment.

7. Single-region secret storage:

The technical problem

Storing secrets in a single AWS region creates a single point of failure that can lead to application outages during regional disruptions. While AWS Secrets Manager supports cross-region replication, many teams have not enabled it, leaving disaster recovery plans vulnerable to secrets being unavailable.

Secret replication keeps primary and replica secrets synchronized, with the same ARN structure across regions for simplified integration. Without replication, your applications cannot fail over to backup regions because the required credentials remain inaccessible.

Security and operational risks

Single-region secret storage undermines disaster recovery strategies. A regional outage can prevent applications from authenticating to databases or external services, extend recovery times and even cause service-wide downtime.

Operational impacts include failed regional failovers, prolonged downtime during AWS disruptions and gaps in meeting business continuity requirements that call for cross-region redundancy.

In-house resolution approach

Implementing replication internally requires configuring AWS Secrets Manager’s native replication features or building custom automation. You need to set up replication for each secret, configure cross-region KMS key policies, adapt application logic to support regional failover and test disaster recovery procedures to confirm secret availability during outages.

Internal teams often underestimate the complexity of disaster recovery testing and validating that replicated secrets stay synchronized during normal operations.

Managed service provider solution

Managed providers deliver comprehensive secret replication strategies with automated failover capabilities. They audit existing secrets, configure replication targets aligned to disaster recovery requirements and deploy monitoring to track replication health.

The managed approach includes automated replication setup, full disaster recovery testing and integration with broader business continuity strategies. Providers also implement failover logic that shifts applications to replica regions automatically, without manual intervention.

Key decision framework: build vs. buy

When in-house resolution makes sense

If your organization has strong internal AWS expertise, dedicated DevOps teams and enough time to invest, you may be able to address these risks on your own. This path works best when:

  • Your teams have deep AWS security expertise
  • Your organization has mature change management processes
  • Deadline pressure is low
  • Building skills internally is a priority
  • Your requirements are highly customized and don’t align with standard solutions

When managed services provide superior value

For most organizations, managed services offer clear advantages when tackling these architectural risks:

  • Speed of implementation: Providers can deploy solutions in weeks rather than months using proven methodologies and prebuilt automation
  • Expertise and experience: Providers bring specialized knowledge from working across many environments, helping you avoid common pitfalls
  • Ongoing management: Managed solutions include continuous monitoring, updates and optimization that internal teams may find difficult to sustain
  • Risk mitigation: Providers typically offer service level agreements and take on responsibility for implementation success, reducing your organizational risk
  • Cost effectiveness: When you account for team time, learning curves and opportunity costs, managed solutions often deliver a stronger total cost of ownership

Act now to protect your AWS environment

The seven risks outlined above highlight how difficult it is to maintain secure, compliant cloud environments on AWS. Rackspace Elastic Engineering and Rackspace Modern Operations directly address these challenges with proven methodologies and automation. Rackspace Elastic Engineering gives you dedicated pods of AWS-certified experts who can implement strategies such as enterprise-wide tagging frameworks and automated AMI pipelines. Rackspace Modern Operations provides 24x7x365 monitoring and incident response to keep controls in place, from CloudTrail configuration to secret rotation to encryption standards.

To help you move quickly, Rackspace Technology is offering limited-time incentives to accelerate your AWS security transformation. Until December 31, 2025, new AWS infrastructure resell customers can receive up to 3 months free on both Elastic Engineering and Modern Operations, available through AWS Marketplace for streamlined procurement. This offer gives you immediate access to managed service capabilities while your internal teams stay focused on business innovation.

Contact us to get started