Account Factory for Terraform (AFT) Thoughts

 ·  ☕ 7 min read

I’ve worked with AWS and Terraform for 3-4 years at this point and when AWS initially announced the Account Factory for Terraform (AFT) I didn’t take a too closer look because the company I worked for already had a solution to account generation that we would not be changing very soon (for many reasons).
After I moved jobs I worked for a company that wanted to build AWS via terraform and were considering the account provisioning process, including using AFT.

What I saw appalled me

The whole thing reeks of one of AWS’s <sarcasim>solutions</sarcasim>.
It’s over-engineered, brittle, hard to maintain, and unnecessarily expensive.

What is AFT?

AWS Account Factory for Terraform is an Infrastructure As Code (IAC) solution to provisioning AWS accounts and configuring a landing zone in them in an automated way.
It allows you to define a list of accounts, Terraform resources that should be applied to all of them, and Terraform resources that should be applied to some of them.

Seems like a good idea. Having a method of building a new account and its landing zone quickly and repeatably will help many organisations meet the demands of their customers or internal teams while maintaining a compliant and serviceable base in each account.

The cracks start to show almost right away

AFT is deployed by Terraform as a module. AWS has already written all the code to deploy it, the only thing they have not considered is how or where to deploy it. Beyond “Deploy this module”, the question of how to get started and the bootstrapping process is left unanswered.

Speaking of getting started, there are prerequisites to setting up AFT.
There are a number of accounts that need to already exist (and be configured) including: The organization Root, Audit, Logging, and a dedicated account for AFT to reside in.
How these accounts came to be provisioned and be configured is ignored.
Further, these accounts are not managed by AFT in any way. When you deploy AFT into its own account, it only deploys the resources for AFT and not any of the landing zone stuff all further accounts will have.

Accounts and their resources are deployed using IAC, you write the accounts as resources in terraform code. Then they are deployed using GitOps, such that when you push or merge into the master branch of the git repository it triggers the deployment pipeline.
AFT only supports a small selection of code repositories it can draw from, CodeCommit (obviously) or Github (cloud). No Github enterprise, Bitbucket, or Gitlab. Other than CodeCommit, it uses Codestar integrations. CodeStar came out in 2017 (5 years ago), how does it still not support any other source code provider?!

One thing I found at this stage is that there is no Terraform plan or approval process that shows what is going to change. By default, it uses the open-source Terraform, but you can use TF Cloud and Enterprise (unless air-gapped). These other options could enforce an approval step but as we’ll see when we dig deeper it will be of limited use.

Now on to some of the good stuff (kinda)

There is a feature called ‘API Helpers’ which allows for running arbitrary scripts both before and after the deployment process.
This is kinda cool and I’m sure there are endless possibilities for its use. The problem is that Terraform already has a mechanism for similar script-running, in a few ways actually.
The example AWS gave is that it could call an API to get a new VPC CIDR.
Terraform has custom and null resources and local-exec as well as HTTP requests and various data sources that can implement basically all the use-cases I can think of.

‘Global Customisations’ is a set of Terraform code that will run be deployed to all accounts AFT manages. This strongly reminds me of CloudFormation’s Stack-Sets.
Well, actually it’s only applied automatically to new accounts. To make changes and update previously AFT-created accounts requires a manual triggering of the pipeline.
You can be selective on which accounts you want it to apply the updates too.
You can select All, a list of specific OUs, a list of accounts with a specific tag, or a list of account IDs. You can do this with both an include and exclude list.
I find it surprising that ‘global’ does not mean ‘all’ but more flexibility is not something I’m going to argue against.
To trigger the pipeline, you need to manually trigger a specific Step Function and pass in a hand-crafted JSON that details the accounts to apply the updates to.

‘Account Customisations’ are similar, in that they are a set of Terraform code that gets deployed to any of the accounts that link to it. These sets of TF code can be reused and applied to multiple accounts.
I feel it undermines the point of Global customisations somewhat, in particular the fact that ‘global’ does not mean ‘all’.
A big problem I have with Account Customisations is that you can only apply one per account, thus they are considerably less flexible and reusable than normal Terraform modules.

Taking a deeper look at the internals

I have not done a total deep dive into how the whole thing works, I don’t really want to. In fact, the way AWS markets it, nobody outside of AWS is expected to look or understand; AWS Support is supposed to be trained and expected to handle this new ‘service’.

But I did look enough into how it provisions new accounts to see a few skeletons.

When you push the changes, it triggers the pipeline and a part of that is a Step Function that runs a Lambda Function which in turn calls Service Catalog to create an account using Control Tower (Remember what I said about over-engineered?).

Using Control Tower does have an advantage of using its feature flags, but they are still very limited and Control Tower is developed very slowly, only recently getting support for OUs.

While you can use TF Cloud or Enterprise (needs to be publicly accessible) and this can be used to force an approval / plan step, the changes that TF is making is not what you think it is.
For example: When provisioning an account, you define the account as a TF resource by deploying the AWS-provided Account module. The problem is that the Account module does not actually instruct TF to create an account; instead, all it does is create an item in DynamoDB that triggers the Step Function (via DynamoDB Streams?). So the plan step shown in TF Cloud would be near worthless.

The worst horror I saw though was that the lambda functions that call the Service Calatog API runs in a VPC behind a managed NAT Gateway.
The VPC, NAT, and more are created when you deploy the AFT, but it’s wholly unnecessary.
Lambda functions can call the AWS APIs and reach the internet just fine without needing to be in a VPC, so having a NAT Gateway that runs constantly regardless of how often you actually use AFT is just bleeding you money.

Closing Thoughts

The whole time I was wondering why this wasn’t a managed AWS service or built into Control Tower in the first place.

It’s built out of AWS services in your own account that you need to maintain and keep up to date. AWS advises against forking the code because they claim they will be updating it often. But you are still responsible for ’re-bootstrapping’ whenever they update.

This may be of some use to resellers and managed services (e.g. DXC and Rackspace) or those that need to spin-up loads of accounts often, but at that point, they likely have built much better solutions internally.

If you already have the foundational accounts and see a need to deploy very similar accounts for workloads and you don’t mind that it is another process and set of resources to keep track of, and update, and pay for… look elsewhere or good luck.

Kieran Goldsworthy
Kieran Goldsworthy
Cloud Engineer and Architect

What's on this Page