A more mature take on
stateless Terraform

Published Oct 7, 2023 by Ricard Bejarano

Before we begin...

Opinions expressed here are solely my own and do not express the views or opinions of my current or past employers.

A year and a half ago, I wrote this piece on why “Terraform should have remained stateless”.

It was a critique of Terraform’s admitted reasons as to why it requires state, why said state makes using Terraform painful, and my proposal for a none state backend that would enable Terraform to operate just as fine without state.

As time passed, I have tried to back my words with code, and got a deeper understanding of Terraform’s need for state and how we could make it stateless without significant changes to its design.

What you are about to read is my self-critique of that piece, along with a revised proposal for a stateless Terraform.

What I got wrong: Terraform needs state

I was wrong: Terraform needs some form of state.

These are the reasons Terraform’s documentation quotes:

Mapping to the real world

You do not really need state to do this.

Most infrastructure resources have one or more attributes that can perform as “primary key” of such resources.

Said attributes need only meet these two requirements:


Some resources have simple attributes that meet these requirements, like an AWS S3 bucket’s name: it is both user-defined before creation and globally unique.

Other resources do not: an AWS EC2 instance’s name is user-defined before creation, but not unique; whereas its ID is globally unique, but only known after creation. For most of these, however, we have things like tags that can be enforced by policy to be unique.

Terraform’s document will tell you that since “not all resources support” said attributes, Terraform needs state.

And here it comes down to opinion.

This makes Terraform significantly worse for the majority of its users (users of public cloud providers with APIs whose resources all have such attributes) so the remaining 1% (or less) can use it.

The way I see it, if a given provider’s resources do not have suchlike attributes, it is not up to Terraform to work around that problem.

I can understand, however, that a newly-released Terraform optimizing for adoption would have conceded this compromise at the time.

Metadata

Truth is I did not understand this one when I first read it. It was not until I tried making Terraform stateless that I understood what this (in my defense, vaguely-named reason) meant.

For any given already-existing resource, Terraform needs to know how, in the absence of its code, to determine its dependents so they can be destroyed in the right order.

For this you absolutely need some form of state.

My old proposal did not solve this, but the revised one will.

Performance

This is admittedly optional.

Syncing

This one I also did not fully understand at the time.

What this means is you need a way of preventing simultaneous changes to the same resource set.

Unlike I said, this is not just a problem of state, as it is not the only thing you are modifying during apply, but also the resource set itself.

However, the fact that state backends are not required to support locking tells us this is somewhat optional too.

What I got right: Terraform’s state is not necessary

As we have seen, if you are in the vast majority of users whose Terraform providers’ resources all have user-defined, unique attributes (such as those from AWS or Kubernetes), your need for state boils down to being able to calculate the old resource dependency tree.

While Terraform’s state does solve this, we can very much do this with the same tool we all use to work with Terraform together: Git!

My revised proposal

A git state backend.

Have Terraform plug into your Git history, take the Terraform code from the previous commit, and use that to calculate the old tree.

A pre-init script (à la Stacks) on top of the default local state backend could do that too, without the need to extend Terraform.

If you really need locking we could decouple the notion of locking from state in Terraform, as it is not just state we are locking, but rather the resource set as a whole. Then you could use whichever mechanism suits you best to just lock. Alternatively, you can lock through your Terraform automation solution of choice, à la Atlantis.

Now, this assumes you use Git with some Terraform automation on top of it. While this may not be the case for all of Terraform users, I think it’s pretty accurate to say it is for the vast majority of us. Furthermore, this proposal is non-exclusive. The git state backend can coexist with all other state backends, if you need them.

The implementation of this proposal would make moving resources on/off and between Terraform root modules extremely easy. State surgery would cease to be a thing!

But is this really “stateless” Terraform?

Yes, and no.

It is “stateless” as in it does not need its own state, but it still requires some form of state storage for the old resource set. Git, in this case.

I will leave that up to you for interpretation.

So, should have Terraform remained stateless?

That was the original title of my first piece, the (arrogant) statement that Terraform should have remained stateless.

Now that I have a deeper understanding of how we could have kept Terraform stateless, I can see how Terraform could have preferred not to depend on Git (or any other Terraform-external code storage) and instead remain a self-contained solution.

That and seeing it through the lens of initially optimizing for adoption and maximizing support, I concede to tune my statement down to:

Terraform could have remained stateless.
But it deliberately and respectably chose not to.

Thanks for dropping by!

Did you find what you were looking for?
Let me know if you didn't.

Have a great day!