Building a Google Cloud Landing Zone in a Scalable, Repeatable and Secure Way (Part 2)
By Jasbir Singh, Staff Consulting Architect, Rackspace Technology
Introduction
This is part two of the two-part series on building a Google Cloud landing zone in a scalable, repeatable and secure fashion. In this post, I share step-by-step guidance for setting up a new landing zone on Google Cloud, using Google’s open-source Fabric FAST, which is part of its Cloud Foundation Fabric. In part one, I shared several manual setup tasks, following the initial steps from the Google Cloud setup checklist, using the Google Cloud foundation guided steps. In part one, we completed steps one and two from the checklist.
From here onwards, we’ll be using Fabric FAST to complete all the remaining activities using infrastructure-as-code.
In this part of the series, we will complete the following steps:
- Run FAST stages/0-bootstrap: Configure automation, billing and log export projects, custom roles, service accounts, organization-level logging and workload identity federation support.
- Migrate Terraform state to a GCS bucket.
- Run FAST stages/1-resman: Provide shared capability folders, like networking and security, and apply a set of top-down organization-level security policies.
- Optionally run FAST stages-multitenant/0-bootstrap-tenant: Provide optional top-level tenant folders that can be treated like separate organizations.
Fabric FAST stages
Fabric FAST is split into multiple stages. The stages are layered, meaning that a given stage is dependent on having run the preceding stages.
Summary of different stages:
- Bootstrap: Provides a foundational organization-level configuration to enable steps that depend on broad administrative permissions, preparing the prerequisites needed to enable automation in this and future stages.
- Resource management: Creates the base resource hierarchy (folders) and the automation resources required later to delegate deployment of each part of the hierarchy to separate stages. And set organization policies on the organization, and any exception required on specific folders.
- Networking: Provides centralized networking resources, using a hub-and-spoke design.
- Security: Implements security configuration, adhering to best practices.
- Google Cloud Project Factory: Creates a YAML-based factory that allows projects to be created for any given tenant (e.g., an application or team). This should be run once per environment (e.g., for dev, QA, staging, production).
- Data platform: An optional stage intended for the deployment of components that would be needed to build a data platform, such as BigQuery, Pub/Sub, Dataflow and Cloud Composer.
- GKE multi-tenant: An optional stage for building a shared GKE environment that can host multiple tenant applications.
Each FAST stage has its own folder in the repo. Each of these folders represents a Terraform root module, a folder where we can run terraform commands to execute a given stage in the overall process.
Multi-tenant top level : One useful feature of FAST is to be able to bootstrap such that the top level of the FAST resource hierarchy is a folder under the organization, rather than in the organization itself. This is useful if you potentially want more than one FAST hierarchy deployed in your organization.
Prerequisite
Most of the prerequisites have been completed in part one. However, you will need an environment running all three — Google Cloud CLI, Terraform and git.
Stage 0: Bootstrap
FAST starts with the 0-bootstrap stage. Open this folder in your cloned repo. You’ll see a detailed description of this stage. In your development environment, make sure you are in the fast/stages/0-bootstrap folder.
Overview
Here is a quick overview of what this stage achieves:
- Creates custom roles that allow you to set IAM policies at the organization level, allowing the resource management service account to grant a specific set of roles.
- Creates a service account for the next stage.
- Creates an automation project for running subsequent FAST tasks.
- Creates GCS buckets to act as the remote backend for the Terraform state and to store FAST stage output files.
- Creates a billing export project.
- Creates a log export project.
- Sets up organization-level logging. Org-level log sinks are created to ensure a proper audit trail right from the start. By default, FAST provides log filters to capture Cloud Audit Logs and VPC Service Controls violations and Workspace Logs into a BigQuery dataset in a newly created top-level audit project.
- Enables Workload Identity Federation support with external providers. This allows non-Google external identities (such as Active Directory or OIDC identity providers, like Ping Identity or Okta) to be used to access Google Cloud resources without having to obtain and share a service account key.
Steps
1. Login: Launch a Google Cloud console browser session, using your Org Admin user. If you want to use the Cloud Shell, simply launch it from the console.
You will be given a link to open in your browser. Paste in the link, and when prompted to choose an account, make sure you select your Org Admin account. Google will give you a verification code, which you can now paste back into your gcloud CLI command.
2. Grant roles: Now we will self-grant the required roles to our Org Admin account:
- Billing account administrator (roles/billing.admin) either on the organization or the billing account logging admin (roles/logging.admin)
- Organization role administrator (roles/iam.organizationRoleAdmin)
- Organization administrator (roles/resourcemanager.organizationAdmin)
- Project creator (roles/resourcemanager.projectCreator)
# set variable for current logged-in user
export FAST_BU=$(gcloud config list --format 'value(core.account)')
# find and set your org id
gcloud organizations list
export FAST_ORG_ID=<your org ID>
# set needed roles
export FAST_ROLES="roles/billing.admin roles/logging.admin \
roles/iam.organizationRoleAdmin \
roles/resourcemanager.projectCreator"
for role in $FAST_ROLES; do
gcloud organizations add-iam-policy-binding $FAST_ORG_ID \
--member user:$FAST_BU --role $role
done
# find your billing account ID
gcloud alpha billing accounts list
export FAST_BILLING_ACCOUNT_ID=<your billing account id>
gcloud beta billing accounts add-iam-policy-binding $FAST_BILLING_ACCOUNT_ID \
--member user:$FAST_BU --role roles/billing.admin
3. Create Terraform variables: Create a terraform.tfvars file, in the 0-boostrap folder. This will be used to supply the stage with the necessary variables for your environment. Note that by default, this terraform.tfvars file will not be under git source control, since it’s excluded in the .gitignore file. This is to prevent you from checking potentially sensitive environment-specific information.
Note: Project IDs in Google Cloud need to be globally unique. So, think carefully about the prefix you want to use.
# use `gcloud beta billing accounts list`
# if you have too many accounts, check the Cloud Console :)
billing_account = {
id = "012345-67890A-BCDEF0"
}
# use `gcloud organizations list`
organization = {
domain = "gcpnerdy.org"
id = 1234567890
customer_id = "C000001"
}
outputs_location = "~/fast-config"
# use something unique and no longer than 9 characters
prefix = "ccoe"
4. Run the Terraform
# Setup application default credentials (ADC)
gcloud auth application-default login
terraform init
# optional - if you want to see what the apply command will do
terraform plan
# apply the configuration
terraform apply -var bootstrap_user=$(gcloud config list --format 'value(core.account)')
It will take a couple of minutes to run. Afterwards, you’ll see three new projects were created and new log sinks will be visible:
In the billing project, we can see a BigQuery billing export dataset has been created. In the iac project, three new GCS buckets have been created.
Output variables
The 0-bootstrap stage creates several output variables. By default, it writes them to the folder ~/fast-config. This can be changed in the .tfvars file we supplied earlier.
Switching state to remote backend
When we first run Terraform init, Terraform stores its state in a local file. However, it’s a good idea to move this state into a so-called remote backend, i.e., a storage location that is accessible over a network and can be used collaboratively. This page describes the general process of migrating Terraform state to a Google Cloud Storage bucket. However, FAST facilitates this process for us. The 0-bootstrap guidance tells us the steps we need to take to migrate the TF state, including:
# to copy the FAST output files from the local fast-config directory:
../../stage-links.sh ~/fast-config
# alternatively, we can copy the FAST output files from the GCS bucket:
export PREFIX=<your prefix>
../../stage-links.sh gs://$PREFIX-prod-iac-core-outputs-0
# If you have any "bad interpreter" errors, you may have issues with your end-of-line chars
# If so, fix with this line, then try again
sed -i -e 's/\r$//' ../../stage-links.sh
The command will return a subsequent ln command, which is specific for this stage. Running this line creates a symbolic link from the newly created 0-boostrap-providers.tf filed to the current working directory. Again, this file will not be checked-in to source control, as it’s excluded in .gitignore.
Now we can migrate the state to the GCS bucket:
terraform init -migrate-state
terraform apply
Collaborative working
Having now migrated the state, if you want to continue the FAST process from another machine, you can. To resume TF steps on another machine, you will need to:
# Ensure pre-reqs: we have installed gcloud CLI, terraform, git
# Ensure terraform.tfvars matches what was set on the other machine
# Login with gcloud CLI, and set ADC
gcloud auth login
gcloud auth application-default login
# Change to: fast/stages/0-bootstrap folder
# Copy the external backend state providers.tf file to the local directory
gsutil cp gs://ccoe-prod-iac-core-outputs-0/providers/0-bootstrap-providers.tf ./
# Initialize Terraform; the state will be pulled from the GCS bucket
terraform init
terraform apply
Stage 1 — Resource hierarchy
FAST continues with the 1-resman stage. Open this folder in your cloned repo. You’ll see a detailed description of this stage. In your development environment, make sure you are in the fast/stages/1-resman folder.
Overview
Here is a quick overview of what this stage achieves:
- Creates the top-level hierarchy of folders and associated resources.
- Sets organization policies and any exceptions required on specific folders.
The diagram below shows the resource hierarchy that’s created.
Notes:
- Separate folders are created for shared capabilities, like networking and security.
- A separate sandbox folder is created. The intent of this folder is for experimentation with resources that are not subject to the same levels of IaC-enforced control as the rest of the hierarchy.
- A top-level Teams folder, which contains separate folders for each application/team tenant.
- Subsequent shared capabilities (e.g., a multi-tenant GKE cluster) could be added without compromising this design.
- This stage is a prerequisite for provisioning top-level tenants. (See below.)
Which organization policies?
If you like, in the folder stages/1-resman/data/org-policies, you can view the various policies that are set by FAST. These include, but are not limited to:
- Enforce compute.requireOsLogin
- Enforce compute.skipDefaultNetworkCreation
- Deny compute.vmExternalIpAccess
- Enforce iam.disableServiceAccountKeyCreation
- Enforce sql.restrictPublicIp
- Enforce storage.uniformBucketLevelAccess
Of course, you are free to edit these files, to configure the org policies you want to apply.
Steps: Add a terraform.tfvars file. e.g.:
team_folders = {
team-ct = {
descriptive_name = "Team CT"
group_iam = {
"team-ct@gcpnerdy.org" = [
"roles/viewer"
]
}
impersonation_groups = ["team-ct-admins@gcpnerdy.org"]
}
}
outputs_location = "~/fast-config"
We run the stage-links.sh script to obtain the commands to copy output from the previous stage.
# either
../../stage-links.sh ~/fast-config
# or
../../stage-links.sh gs://$PREFIX-prod-iac-core-outputs-0
As per the instructions, copy the commands and run them. Now run the Terraform:
# Initialize Terraform; the state will be pulled from the GCS bucket
terraform init
terraform apply
For me, this creates 63 new resources and takes a couple of minutes to run. You’ll see that several new projects have been created.
Configuring a tenant top level folder: 0-bootstrap-tenant
Now, we can optionally proceed to the 0-boostrap-tenant stage. Open the stages-multitenant/0-bootstrap-tenant folder in your cloned repo and look at the guidance. In your development environment, make sure you are in the fast/stages-multitenant/0-bootstrap-tenant folder. Remember that we need to have run both the 0-bootstrap and 1-resman stages before running this stage.
Overview
Here we create a top-level tenant folder under the organization node. This stage creates service accounts for all tenant stages, such that billing and Organization Policy Administration bindings can be set, leveraging permissions of the org-level resman service account, which is used to run this stage. This avoids the need to grant broad scoped permissions on the organization to tenant-level service accounts, thus decoupling the tenant from the organization.
Steps
# either
../../stage-links.sh ~/fast-config
# or
../../stage-links.sh gs://$PREFIX-prod-iac-core-outputs-0
As per the guidance:
- Copy all the commands, paste them and run them.
- Edit the 0-bootstrap-tenant-providers.tf file, and supply the name of your top-level tenant, as the variable “prefix.”
terraform {
backend "gcs" {
bucket = "ccoe-prod-iac-core-resman-0"
impersonate_service_account = "ccoe-prod-resman-0@ccoe-prod-iac-core-0.iam.gserviceaccount.com"
# remove the newline between quotes and set the tenant name as prefix
prefix = "dazbo-lz"
}
}
provider "google" {
impersonate_service_account = "ccoe-prod-resman-0@ccoe-prod-iac-core-0.iam.gserviceaccount.com"
}
provider "google-beta" {
impersonate_service_account = "ccoe-prod-resman-0@ccoe-prod-iac-core-0.iam.gserviceaccount.com"
}
# end provider.tf for bootstrap-tenant
Now provide a terraform.tfvars file, and the required config:
tenant_config = {
# used for the top-level folder name
descriptive_name = "CCOE Tenant A"
# tenant-specific groups, only the admin group is required
# the organization domain is automatically added after the group name
groups = {
gcp-admins = "tnta-admins"
# gcp-devops = "tnta-devops"
# gcp-network-admins = "tnta-networking"
# gcp-security-admins = "tnta-security"
}
# the 3 or 4 letter acronym or abbreviation used in resource names
short_name = "tnta"
# optional CI/CD configuration, refer to the org-level stages for information
# cicd = {
# branch = null
# identity_provider = "foo-provider"
# name = "myorg/tnta-bootstrap"
# type = "github"
# }
# optional group-level IAM bindings to add to the top-level folder
# group_iam = {
# tnta-support = ["roles/viewer"]
# }
# optional IAM bindings to add to the top-level folder
# iam = {
# "roles/logging.admin" = [
# "serviceAccount:foo@myprj.iam.gserviceaccount.com"
# ]
# }
# optional location overrides to global locations
# locations = {
# bq = null
# gcs = null
# logging = null
# pubsub = null
# }
# optional folder ids for automation and logging project folders, typically
# added in later stages and entered here once created
# project_parent_ids = {
# automation = "folders/012345678"
# logging = "folders/0123456789"
# }
}
# tftest skip
outputs_location = "~/fast-config"
One important thing to note is that we’ve defined a new admin Google group called tnta-admins. This group needs to exist before we can apply the Terraform. So, in the Admin Console, add this new group.
Now run Terraform, as usual:
# Initialize Terraform; the state will be pulled from the GCS bucket
terraform init
terraform apply
Finally, we can see that our top-level tenant folder has been created:
Note that this stage also creates staging buckets that are tenant-specific:
In summary
In this second part of the process of building a Google Cloud landing zone, we have completed these steps:
- Executed FAST stages/0-bootstrap: To configure automation, billing and log export projects, custom roles, service accounts, organization-level logging and workload identity federation support.
- Migrated Terraform state to a GCS bucket.
- Executed FAST stages/1-resman: Provide shared capability folders, like networking and security, and applied a set of top-down organization-level security policies to ensure we’re secure from the start.
- Executed FAST stages-multitenant/0-bootstrap-tenant: To provide optional top-level tenant folders.
- Executed FAST stages-multitenant/1-resman-tenant: To create networking and security shared folders under the top-level tenancy.
Recent Posts
Building a Google Cloud Landing Zone in a Scalable, Repeatable and Secure Way (Part 1)
August 28th, 2024
Building a Google Cloud Landing Zone in a Scalable, Repeatable and Secure Way (Part 2)
August 28th, 2024
Rackspace Technology response to Crowdstrike July Incident
July 19th, 2024