The Evolution of Building a Multi-Regional Network Hub to Connect Several Remote Locations
by Ruban Suthan, Senior Cloud Engineer, Rackspace Technology
An organization based in Australia needed to connect several remote sites spread out as far as Canada and Africa. The sites were remote and had limited internet connectivity. To establish connectivity, the company had been using satellite internet service providers like Starlink.
Rackspace Technology® proposed a simple virtual private cloud (VPC) connecting all remote sites via a site-to-site virtual private network (VPN). We determined that the company’s gateway peer could support the default behavior of the Amazon Web Services (AWS) side of the connection.
A team then took the organization’s basic network setup and grew it from a single VPC connecting to a data center to a full-fledged multi-regional hub-spoke architecture connecting multi-regional remote sites.
The graphic below illustrates the initial design.
We used a transit gateway (TGW) to establish connectivity between VPCs spread across accounts and to terminate site-to-site connections at the TGW instead of a virtual private gateway (VPG).
Due to the location of the remote sites, which were even further from the home VPC region, they were reliant on 3G/4G/satellite internet connectivity. We suggested they use AWS Global Accelerator on the VPNs to route traffic through the nearest AWS point of presence.
With the inclusion of the Global Accelerator, the design evolved as depicted in the following illustration.
Imagine you’re a customer using a cloud service for important tasks. However, like most companies, you would not want all your vital services to rely solely on one provider. That would be risky because it could lead to a single point of failure. To avoid that problem, you could distribute the risks by using more than one cloud vendor.
This was the case with our customer, who choose Palo Alto Networks’ Prisma Cloud to provide a middle layer between the remote sites and AWS. It also used the nearest point of presence provided by Palo Alto Networks to stay closer to its remote sites. This is shown in the diagram below:
At this point in the design, we had successfully ensured high availability (meaning it could withstand any potential zonal failures) by using multiple availability zones and avoiding reliance on a single cloud vendor.
However, there was still one concern that bothered the organization. What if an AWS region or Palo Alto regional construct fails? Will it pull down the rest of the services together with it?
The simple answer is yes. So, how could we circumvent this risk? Our team decided that the best solution was to use AWS Regions and Zones. This service greatly enhanced the architecture, providing the ability to handle faults and ensuring overall stability.
In this setup, each region had its network hub, just like in the initial design, and it maintained the same number of VPCs across AWS accounts. Each network hub interconnects in a mesh-like pattern. Each site-to-site connection originating from Palo Alto Prisma terminates at each end of the regional network hubs.
Initially, every redundant site-to-site VPN connection was connected to a single network hub as shown below.
Once the rest of the regional network hubs were ready, the redundant connections traveled to their intended regional hub as shown below.
For the multi-regional network hub implementation, we choose AWS Network Orchestrator as the serverless automation tool. It was deployed using the Customizations for the AWS Control Tower from the master account. The Network Orchestrator was seamless in its execution in applying attachments and associations against the TGW and its route tables. Even the peering between regional TGWs was applied using the Network Orchestrator.
Conquering an accessibility challenge
As we were setting up the Network Orchestrator on a brand-new AWS region, we encountered a situation where certain services were not yet accessible. This prevented us from fully using the Network Orchestrator’s complete range of features. Our work-around included programmatically excluded those services by making changes to the code. This way, we could still proceed with the implementation without those specific features.
Due to security and cost optimization considerations, we introduced an egress VPC for each regional network hub to direct all outbound internet traffic from the regional spoke VPCs through it. Also, we implemented interface endpoints with their own private hosted zones on the same egress VPC to be shared across the network hub.
In simpler terms, we added an extra layer of reliability to the organization’s cloud setup by bringing on another cloud provider between regional sites and AWS. We also opted for AWS regional redundancy through a multi-regional network hub, making it more reliable, fault-tolerant and redundant. This way, if one part of the system fails, it would not bring down everything else with it.
It’s like having an extra backup plan. In the end, this design transformation helped our client gain a smoother and safer cloud experience overall, from accessibility to reliability.
Recent Posts
Google Cloud Hybrid Networking Patterns — Part 1
October 17th, 2024
Google Cloud Hybrid Networking Patterns — Part 3
October 17th, 2024
Google Cloud Hybrid Networking Patterns — Part 2
October 17th, 2024
How Rackspace Leverages AWS Systems Manager
October 9th, 2024
Windows Server preventing time sync with Rackspace NTP
October 7th, 2024