Good Environments == Good Security

Published in

Tom Harrison’s Blog

4 min readJun 8, 2023

My former company had what might be called state-of-the-art environments, a fully functional CI/CD pipeline, and all of the coolest bits and parts that the geekiest geek could want. But of all the environments we created, the one leading most surely to excellent security was the “infra” environment — credit to the brilliant Raf for ensuring that the team was protected long enough to build that.

Even though we had suitably fancy dev => staging => prod environments, this was oriented around software process. While we did have a reasonable terraform dev-ops for developers to create new resources, it only allowed developers to ensure their software worked then. Devs were told to be sure their software was secure … usually after we discovered some case that had been overlooked. (Don’t get me wrong: we did an exceptional job with security in general).

However, almost the first thing you’ll find if you look for “things you can do to improve security” is simple and obvious: keep your code versions updated. Not so excellent on that front.

Some languages are better than others but all have dependencies, and eventually some library you have used for years will stop getting updates, and have a vulnerability that needs patching. Somehow it seems that Go just makes it easy to stay up to date, while Node, Ruby, Python make it hard. But hard or not, it’s just a requirement these days. It’s not “something you’ll do in your spare time”, it’s critical that you have a means to confirm that version upgrades work. Software environments, supported by reasonable test coverage support this.

We talk a lot about CI/CD — continuous integration and continuous delivery. Perhaps we should talk about CU for continuous upgrading. Another brilliant leader at my last company, Dean, just made it a simple practice to ensure he upgraded to the latest version as part of any new PR. And Raf implemented Dependabot on the assumption if it were crazy-easy to merge a PR, the engineers would do so. Turns out, fear is a stronger motivator than doing the right thing, and Dependabot just became noise on teams whose managers didn’t seem to understand.

But all of this doesn’t address the much more significant security risk: an old infrastructure also is built on old code. If you can’t upgrade your infrastructure, you have a far greater security risk than if you can’t upgrade your software. (Both are bad, and both are risks…). When I started in 2020 we had an old Kubernetes version from years before that was not upgradeable anymore without massive resources and effort.

To the credit of our CTO, we decided instead to just develop a new platform and migrate our existing code. He estimated this as a 12-week effort. It started in late 2020, and was finally completed, mainly, in March of 2023. Turns out there’s a point in a company’s life when even the most magnificent magic man cannot pull off the feat alone. (In fairness, I was hired to help said magic man, and that just made his life immeasurably less magical, as they forget to ask if I was also a most magnificent magic man, or something else, like an ~muggle~ engineer.) So at least they didn’t pull the project, and it did get done.

It was truly magnificent when Raf joined and brought another actual engineer, Luc, along with him — they created a fully automated process of building the entire EKS cluster from scratch in a new account, which was the Infra environment. We had largely done the right thing in building the new EKS environment, which is to say we built it in code — in our case terraform, and ArgoCD. But building components, and bootstrapping an entire cluster in a repeatable way are different things, and Luc made it happen. Luc and Raf are both Belgian, and I learned several new Belgian curses. But it happened.

Why is an Infra environment important? Because our dev, staging, and production environments were all part of a full CI/CD pipeline — if we couldn’t deploy to dev, we couldn’t deploy — in short, our dev => staging => production pipeline had become production. The Infra environment was isolated, so no one cared when it broke — the Infra environment was the “dev” environment for the “dev” environment!

Once we had an Infra environment, we could test new Kubernetes versions as soon as they were available. We could test new databases, new versions of ingress or egress, new everything. Suddenly, the static and increasingly frozen-in-place world that had characterized the platform we had had to abandon became dynamic, frequently updated, stable, and feature rich.

At its core, the Infra environment allowed us to upgrade our most vulnerable and aging software easily and quickly and keep it that way, and because there were no more limitations, the software itself should be easy to upgrade too — all the parts are there, they just need the kind of engineering mindset that Raf and others brought.

I am no longer there, so don’t know it that will happen. Many exceptional engineers and others in the company were victims of what the company described as “economic headwinds”. I was part of the recent very cold downdraft from those headwinds, known as a reduction in force.

I was very lucky to find new employment and am thrilled … and yes, almost the first thing I am doing: creating a proper set of environments that allow us to keep all of our software up to date!

Good Environments == Good Security

Written by Tom Harrison