VPC Lattice: Cross-Cluster Service Discovery and Service Mesh
By Masoom Tulsiani, Cloud Solution Architect, Rackspace Technology
Introduction
As organizations move toward microservices and distributed application architectures, managing secure, scalable and reliable communication between services becomes a growing challenge. Traditional networking models often fall short in large, dynamic environments, especially when services span multiple AWS accounts and VPCs.
To address these limitations, AWS introduced VPC Lattice, a fully managed application networking service that simplifies service-to-service communication across cloud-native environments. VPC Lattice brings together application-layer routing, service discovery, authentication, and observability — all without requiring developers to manage complex network configurations or sidecar-based service meshes.
This technical blog explores how VPC Lattice can help modernize service connectivity and simplify the implementation of service mesh patterns across Amazon EKS, ECS and Lambda. You'll learn about the core components of VPC Lattice, common design patterns, practical use cases and implementation guidance for building secure, scalable and efficient microservices architectures.
Understanding service mesh architecture in the cloud
Service mesh architecture addresses many of the operational and networking challenges developers face in distributed or microservices-based systems. At its core, a service mesh helps manage how services communicate with each other, handling routing, discovery, security, observability and traffic policies without requiring changes to application code.
One of the key building blocks is service discovery, which allows services to dynamically locate and connect with one another. In a typical cloud-native service mesh, features often include:
- Dynamic service discovery
- Load balancing
- TLS termination
- HTTP/2 and gRPC proxying
- Circuit breakers
- Health checks
- Traffic shaping (e.g., canary deployments or traffic splitting)
- Fault injection for testing
- Rich metrics and observability
Enter VPC Lattice
VPC Lattice is AWS’s fully managed Layer 7 application networking and service mesh solution. It enables service-to-service communication across VPCs, accounts and compute platforms such as EKS, ECS, EC2 and Lambda, without requiring sidecar proxies or manual network configuration.
Evolving Kubernetes networking: from Ingress to Gateway API
In Kubernetes environments, the Ingress resource has long been the standard for exposing services to external clients. However, as networking use cases have evolved, limitations in the Ingress API have become more apparent.
To address this, the Kubernetes community introduced the Gateway API, which represents the next generation of L4/L7 networking resources for Kubernetes. It enhances and eventually aims to replace the Ingress API by supporting more flexible and expressive routing capabilities.
AWS’s Gateway API controller for VPC Lattice takes advantage of this evolution. When a gateway is created and associated with a Kubernetes cluster VPC, the controller automatically provisions a service network. This service network acts as the communication fabric, enabling secure, scalable and observable traffic flow between services.
In addition, when you define new Kubernetes CustomResourceDefinitions (CRDs), the Kubernetes API server exposes new RESTful resource paths, enabling more extensible, declarative networking configurations through the Gateway API and VPC Lattice.
Challenges with traditional service mesh architectures
As microservices environments grow in complexity, managing service-to-service communication — especially across accounts, regions, and platforms — becomes increasingly difficult. While service mesh architectures aim to address these issues, their implementation in public cloud environments introduces a new set of challenges.
Complexity in setup and management
Traditional service meshes can be difficult to deploy and operate, particularly in public cloud environments where developers are already managing a variety of tools and services.
- Deployment complexity: Configuring service meshes like Istio or Linkerd often involves managing sidecar proxies, control planes, certificate management and integrating with services like AWS IAM or Google Cloud Identity.
- Ongoing maintenance: Updating mesh components, managing policies and monitoring service behavior at scale increases the risk of misconfiguration, which can lead to degraded performance or security vulnerabilities.
Latency and performance overhead
The sidecar proxy model, central to many service meshes, can introduce unnecessary overhead in environments where every millisecond and compute cycle counts.
- Increased latency: Service-to-service communication is routed through sidecars, adding extra hops and potentially slowing response times, especially for latency-sensitive applications.
- Higher cost: Sidecars consume additional CPU and memory, and the increase in network traffic between services drives up public cloud egress and ingress costs.
Multicloud and hybrid cloud challenges
Modern organizations frequently operate across multiple public clouds and on-premises environments, making consistent service mesh implementation more difficult.
- Interoperability issues: Each platform has its own networking model and identity management system, complicating efforts to create a unified service mesh.
- Policy inconsistency: Enforcing consistent traffic management, security and monitoring policies across heterogeneous environments requires custom tooling or manual effort.
Security complexity
Although service meshes offer built-in security features such as mutual TLS (mTLS) and fine-grained authorization, managing these at scale can be complex.
- Operational burden: Setting up and rotating mTLS certificates across hundreds of services is time-consuming and error-prone, increasing the risk of outages or security gaps.
- Overlapping security models: Native cloud security mechanisms, like IAM roles, security groups, and VPC configurations, can conflict with mesh-based policies, leading to redundant or inconsistent configurations.
Observability and monitoring complexity
One of the key advantages of a service mesh is improved visibility, but capturing and integrating telemetry data comes with its own challenges.
- High data volume: Traces, metrics and logs generated by the mesh can result in significant storage and bandwidth costs in cloud environments.
- Tooling integration: Combining mesh-native observability tools (e.g., Prometheus, Jaeger) with platform-native tools (e.g., CloudWatch, Stackdriver) requires additional configuration and operational expertise.
Scaling challenges in large clusters
As service mesh deployments grow across thousands of services and multiple clusters, scalability concerns emerge, particularly in Kubernetes environments.
- Resource drain: Sidecar proxies on every pod increase node-level resource consumption, reducing overall cluster efficiency.
- Control plane bottlenecks: Centralized mesh components (like Istio’s Pilot) can become performance chokepoints, especially in multi-cloud or high-throughput environments.
Transitioning from App Mesh to VPC Lattice
AWS has historically offered multiple service mesh solutions, including App Mesh and VPC Lattice.
App Mesh was designed for workloads running on ECS, EKS and EC2, providing client-side service mesh capabilities such as traffic resiliency (e.g., retries, timeouts, connection pooling) and mutual TLS (mTLS) encryption. However, it relied on Envoy-based sidecar proxies, which introduced configuration complexity and additional infrastructure overhead.
AWS has announced that App Mesh will be deprecated. After September 30, 2026, customers will no longer have access to the App Mesh console or resources, signaling a broader shift toward more integrated, simplified service networking.
In contrast, VPC Lattice is AWS’s modern, fully managed Layer 7 application networking and service mesh solution. It bridges traditional networking and application-layer communication, allowing organizations to connect microservices running across Kubernetes, EC2, Auto Scaling groups, Lambda, and Fargate.
VPC Lattice is ideal for teams that want to automate service discovery, traffic management, authentication, authorization, and observability—without the complexity of sidecar-based service meshes. It’s particularly well-suited for users who want to deploy modern application architectures without deep expertise in VPC networking or managing mesh infrastructure.
Solution overview
AWS VPC Lattice provides a fully managed application networking and service mesh solution designed to simplify service-to-service communication across AWS environments. It enables connectivity between services running on Amazon EKS, ECS, EC2, Lambda and more — without requiring sidecar proxies or complex network configurations.
The main benefits of VPC Lattice include:
- Traffic management and load balancing: Leverages Kubernetes-native Gateway APIs to define traffic routing rules and load balancing across endpoints in multiple AWS clusters.
- Observability: Helps monitor and troubleshoot service-to-service communication, including request types, traffic volumes, errors and response times.
- Service discovery and large-scale connectivity: Connects thousands of services across VPCs and accounts without increasing network complexity.
- Authentication and authorization with IAM policies: Uses IAMAuthPolicy, attached to VPC Lattice service networks or individual services, to control access to services. These policies can be directly associated with Gateway API resources.
- Granular access control: Enhances service-to-service security with centralized access permissions that support zero trust architectures through context-specific authentication and authorization.
- Advanced traffic controls: Supports fine-grained routing features like request-level routing and weighted targets, enabling rollout strategies such as blue/green and canary deployments.
- Abstracted network configuration: Simplifies service discovery and routing across ECS tasks running in different clusters by removing the need for VPC peering or transit gateways.
- Efficient cross-region traffic management: Manages traffic across regions and environments, supporting use cases such as real-time fraud analysis and alerting.
- Support for routing resilience: Enables automatic routing and failover capabilities to help maintain application availability, even if parts of the system encounter issues.
VPC Lattice automatically handles service-to-service communication, cross-VPC and cross-account connectivity and advanced traffic control, making it easier to deploy and operate modern distributed applications at scale.
Figure: VPC Lattice Components and Service Networks
Detailed solution description
VPC Lattice abstracts away the complexities of underlying network infrastructure, allowing teams to focus on secure, scalable service-to-service communication. Instead of relying on traditional networking constructs like VPC peering or transit gateways, VPC Lattice introduces a service mesh–like architecture that simplifies how microservices interact across AWS environments. Key components include:
- Service network:
At the heart of VPC Lattice is the Service Network, which acts as a logical boundary for grouping services. Within a service network, you can define and enforce routing rules, apply security policies, and manage traffic flow across services. These networks support both VPC-based and non-VPC-based resources, offering flexibility in hybrid and serverless environments. - Access management:
VPC Lattice integrates with AWS Identity and Access Management (IAM) to control which services can communicate with one another. Fine-grained IAM policies help define access at the service or service network level, supporting secure and context-aware interactions between microservices. - Cross-region and cross-account communication:
VPC Lattice removes the need for complex cross-region VPC peering or account-level endpoint management. Services can communicate securely across AWS accounts and regions, without requiring custom networking setup or manual configuration.
The following section illustrates a practical use case: cross-cluster service-to-service communication using VPC Lattice.
Figure: Service-to-service communication across Amazon EKS Clusters using Amazon VPC Lattice
Two foundational concepts in VPC Lattice are the Service Network and the Service. Within a Service, AWS provides familiar constructs, similar to those used in Application Load Balancers (ALBs) — including listeners and target groups. These target groups can include resources such as Amazon EKS pods or ECS tasks. For the purposes of this blog, we’ll focus specifically on examples involving Amazon EKS and Amazon ECS.
VPC Lattice traffic flow
VPC Lattice offers a powerful and flexible approach to managing traffic flow and API calls between services deployed across various AWS resources, including Auto Scaling groups and Lambda functions.
To connect an Amazon EC2 Auto Scaling group to a VPC Lattice service, follow these steps:
- Create a target group that routes requests to EC2 instances, identified by instance ID.
- Configure a listener on the VPC Lattice service to forward incoming requests to this target group.
- Attach the target group to your Auto Scaling group.
Once attached, Amazon EC2 Auto Scaling automatically manages the target registration lifecycle. It registers new instances with the target group as they launch and deregisters them when they are scheduled for termination, ensuring consistent and accurate routing.
The target group becomes the entry point for all inbound traffic to your Auto Scaling group. Incoming requests are routed based on listener rules defined within the VPC Lattice service, allowing precise control over how traffic is distributed across instances.
Strategies, design patterns and implementation guide
In a multi-account AWS environment, VPC Lattice provides a secure and scalable way to enable service-to-service communication across VPCs and AWS accounts. It achieves this by establishing service networks and service associations, which simplify connectivity while maintaining strong security boundaries.
Consider the following architecture:
- Consumer Account A and Consumer Account B need to communicate with services hosted in Provider Account A and Provider Account B, respectively.
- In Provider Account A, a VPC Lattice Service Network 1 is associated with VPC Lattice Service 1. This setup allows EC2 instances in Consumer Account A to connect via DNS resolvers configured with the appropriate VPC associations.
- In Provider Account B, Service Network 2 is associated with VPC Lattice Service 2. A Lambda function in Consumer Account B uses an Elastic Network Interface (ENI) to connect, enabling secure and dynamic service-to-service communication.
- Meanwhile, Provider Account B hosts EC2 instances in an Auto Scaling group, ensuring elasticity and high availability to support varying workloads from both consumer accounts.
This architecture greatly simplifies inter-account networking, removing the need for complex point-to-point configurations. By decoupling service connectivity from underlying network infrastructure, VPC Lattice supports:
- Fine-grained access control
- Application-layer traffic routing
- Scalable, secure communication across diverse compute environments
The result is reduced operational overhead, improved observability, and enhanced architectural flexibility for organizations managing services across multiple AWS accounts and VPCs.
Figure: Illustration of multiple clusters/VPCs service-to-service communication
Example: EKS cross-cluster communication
In this example, Amazon VPC Lattice is used to enable secure, scalable communication between services running in separate EKS clusters.
Figure: HTTPS Traffic going through VPC Lattice
- The Gateway API Controller manages the creation of VPC Lattice resources based on HTTPRoute and IAMAuthPolicy definitions.
- External-DNS is responsible for generating DNS records in a Route 53 Private Hosted Zone. It uses the same HTTPRoute objects that define custom domain names for services.
- When an API Gateway HTTPRoute includes a custom domain, Amazon VPC Lattice automatically creates a corresponding DNS endpoint for that entry.
- For authorization, the solution uses IAM auth policies in combination with EKS Pod Identity. This approach simplifies the process for Kubernetes administrators to grant IAM permissions to applications running within the cluster.
This setup allows services across clusters to communicate securely using familiar Kubernetes-native tooling and AWS-managed service mesh infrastructure.
New feature: Cross-VPC Resources access with Privatelink for VPC Lattice.
As announced in AWS Re:Invent 2024 on Dec 1st, 2024, PrivateLink now supports native cross-region connectivity.
AWS PrivateLink represents a highly available, scalable technology that establishes private, unidirectional connectivity from your VPCs to VPC endpoint services. This connectivity extends to supported AWS services and now, directly to VPC resources. Previously, the access and sharing of services were limited to those utilizing a Network Load Balancer or Gateway Load Balancer and Interface VPC endpoints only supported connectivity to VPC endpoint services in the same region.
By bringing benefits like VPC-to-VPC connectivity through Privatelink or VPC Lattice, Multi-account sharing via Resource Access Manager(RAM), Scalable traffic management (Inheriting benefits of Lattice and Privatelink both) and enhancing security by inheriting VPC Lattice security mechanisms & resource configurations.
Sharing Integrated with RAM
With this enhancement, customers can now share any VPC resource through AWS Resource Access Manager (AWS RAM). The shared resource may be an AWS-native resource, such as an Amazon RDS database, a domain name, or an IP address, located in another VPC or on-premises environment. Once shared, authorized users can access these resources privately via VPC endpoints.
Event driven architecture Integrations
Cusomers can share AWS resources such as Amazon Elastic Compute Cloud (EC2) instances, Elastic Container Service (ECS) and Elastic Kubernetes Service (EKS) container services, and their own HTTPS services across Amazon VPC and AWS account boundaries and use them to build event-driven apps via EventBridge and orchestrate workflows with AWS Step Functions.
Customers can now create and use Resource Gateways and Resource Configurations. EventBridge and Step Functions now work hand-in-hand with PrivateLink and VPC Lattice to enable integration of public and private HTTPS-based applications into event-driven architectures and workflows.
VPC Endpoints
AWS PrivateLink customers can now leverage VPC endpoints, powered by AWS PrivateLink, to securely and privately access VPC resources. These resources—including, but not limited to, databases and clusters—may reside within your VPC or an on-premises environment and are not constrained to being load-balanced..
Customers may configure a resource VPC endpoint to target a single resource or aggregate multiple resources within an Amazon VPC Lattice service network, and subsequently access the service network using a dedicated service network VPC endpoint.
Case studies and examples
1. Real-time fraud detection in financial services
A financial services organization runs a real-time fraud detection system to monitor customer transactions and identify suspicious behavior. The application is built on Amazon ECS Fargate and includes multiple microservices, such as transaction monitoring, rule-based scoring, and alerting.
To operate effectively, the application requires secure, low-latency communication across multiple clusters and integration with both cloud-native and on-premises systems, including:
- On-premises data center – Hosts historical data, including legacy transactions and customer records, which are critical to accurate fraud analysis.
- Amazon RDS for PostgreSQL – Stores recent transaction data, fraud alerts, and metadata used for analysis and reporting.
Previously, the organization had implemented Cloud Map and experimented with App Mesh to support service connectivity. However, as scalability and compliance requirements grew, so did the need for a more streamlined, manageable solution.
By adopting VPC Lattice as a managed service mesh, the company enabled secure, cross-cluster connectivity across ECS Fargate services—spanning multiple AWS accounts and VPCs, as well as on-premises infrastructure.
Key elements of the solution include:
1. Cross-cluster service discovery and routing
- ECS Fargate tasks in different clusters communicate without requiring complex network configurations.
- Services such as transaction monitoring are registered with VPC Lattice, which routes requests based on defined traffic rules. For example, flagged transactions are routed from monitoring to scoring services for evaluation.
2. On-premises data access
- AWS Direct Connect links the on-premises data center with AWS VPCs.
- VPC Lattice securely routes requests between ECS tasks and on-premises databases, enforcing IAM-based access controls and preventing unauthorized access.
- This allows the fraud detection engine to retrieve historical data on demand, enriching real-time analysis.
3. Real-time access to Amazon RDS for PostgreSQL
- ECS tasks interact with Amazon RDS to query and store recent transaction records.
- VPC Lattice facilitates direct access to the RDS instance, enabling distributed ECS services to communicate with a central database, even across clusters.
- The architecture scales seamlessly with increased transaction volume.
4. Unified security and observability
- The organization enforces consistent security policies, including encryption in transit and fine-grained IAM controls.
- VPC Lattice provides visibility into service-to-service traffic, including metrics such as latency, request volume, and error rates.
- This supports regulatory compliance by offering a complete audit trail of cross-service interactions.
2. Travel and e-commerce application using ECS and Lambda
A travel and e-commerce company needed to securely connect applications built on Amazon ECS and AWS Lambda. The platform included multiple services that required consistent connectivity and traffic management across different compute types.
In this setup:
- The front-end travel booking application runs on Amazon ECS and is exposed to internet users via a public Application Load Balancer (ALB).
- The checkout service also runs on ECS.
- The booking service runs on AWS Lambda.
Both the checkout and booking services are published as Amazon VPC Lattice services and are accessible within the same VPC Lattice service network.
To support Lambda integration, VPC Lattice was configured to use Lambda as the target group type, with a designated function specified for invocation.
This solution provided seamless application layer load balancing and network connectivity across services. It allowed the platform team to focus on application development instead of managing the complexities of underlying infrastructure, such as configuring VPCs, managing inter-account networking, or dealing with overlapping IP address ranges.
Challenges and considerations
While VPC Lattice addresses many of the traditional pain points associated with service mesh deployments, it's important to understand the broader challenges and trade-offs that may arise, particularly in multi-cluster, hybrid and managed service mesh scenarios.
Multi-cluster networking challenges
One common issue in multi-cluster environments is overlapping IP address ranges, which can complicate routing and connectivity between services in different VPCs or accounts.
VPC Lattice helps mitigate these challenges by:
- Supporting Kubernetes Gateway API natively, eliminating the need for custom integrations.
- Avoiding the use of Custom Resource Definitions (CRDs) for service configuration.
- Automatically managing network connectivity between VPCs and accounts, including network address translation (NAT) between IPv4 and IPv6, and across overlapping IP ranges.
This built-in support significantly reduces the operational complexity of multi-cluster deployments.
Vendor lock-in and customization constraints
Managed service mesh solutions are often tightly coupled with their respective cloud platforms, which can limit portability and customization.
- Cloud-native mesh solutions like AWS App Mesh or Google’s Cloud Service Mesh offer deep integration with platform services, but this tight coupling can make it harder to move workloads to other providers or on-premises environments.
- Customization restrictions may also arise. Public cloud–managed meshes typically offer less flexibility compared to open-source projects like Istio, particularly for organizations with unique traffic policies or advanced observability needs.
These considerations are important when evaluating whether a managed solution like VPC Lattice aligns with your long-term architectural strategy.
Why VPC Lattice should be part of your AWS networking strategy
AWS VPC Lattice simplifies how organizations connect, secure and monitor services across accounts, VPCs and regions. By automating service discovery, traffic management and security policy enforcement, it reduces operational complexity while supporting scalable, resilient application architectures.
Benefits of using VPC Lattice in a fraud detection scenario
- Reduced network complexity – Abstracts the networking configuration needed for cross-cluster and on-premises communication, simplifying service discovery and routing between ECS tasks.
- Scalable, low-latency performance – Efficiently manages traffic across regions and environments to support real-time fraud detection and alerting with minimal latency.
- Enhanced security and compliance – Uses IAM policies, encryption, and centralized access controls to protect sensitive data in transit and meet regulatory requirements.
- High availability and resilience – Supports automatic routing and failover, helping critical systems remain available even if parts of the architecture experience issues.
Whether you're building real-time fraud detection workflows, modernizing legacy systems, or scaling across multiple AWS accounts, VPC Lattice provides the service mesh capabilities and application-layer networking tools needed to simplify your architecture, without the burden of managing traditional service mesh infrastructure.
References:
AWS Documentation: VPC Lattice Overview
Blog post: Build secure application networks with VPC Lattice, Amazon ECS, and AWS Lambda

Recent Posts
Scaling Landing Zone Customizations on AWS
May 2nd, 2025
Deploy Palo Alto Firewall on Google Cloud
March 13th, 2025
The 2025 State of Cloud Report
January 14th, 2025
Create Custom Chatbot with Azure OpenAI and Azure AI Search
December 10th, 2024