Using compose files as a universal infrastructure interface

Lucas da Costa on March 22, 2023

Share via

The ability to improve a design occurs primarily at the interfaces. This is also the prime location for screwing it up. — Shea's Law

At Ergomake, we believe that docker-compose.yml is a perfectly fine format to describe most applications. It's so good we could use it as a universal infrastructure language, even for Kubernetes.

In fact, compose files are so good they "solve" platform engineering.

In this blog post, we'll explain what we mean by a universal interface, why compose files are a much better alternative, and how they can "solve" platform engineering.

We give the wrong abstractions to the wrong people

An abstraction is a way to hide unnecessary detail. When I tell you that dogs bark, for example, I don't have to tell you whether I'm talking about a chihuahua or a golden retriever. "Dog" is a label that encompasses both and is precise enough for the sake of what I'm trying to say.

By the same token, not all chihuahuas or golden retrievers are the same. Still, I can use those abstractions to say that chihuahuas are tiny and golden retrievers are large.

The word "dog" abstracts away all the unnecessary detail for the first phrase, but it's insufficiently specific for the second example. To say that a "dog" is small or large, I must use a more detailed abstraction. In that case, I chose the abstractions "chihuahua" and "golden retriever." The more control you need, the less you can abstract away.

If you're a veterinary, for example, you'll need to know more than a dog's breed when performing surgery or prescribing medicine. Conversely, most people don't need a veterinary degree if they just want to adopt a Shiba Inu and call it Elon.

To require people to get a degree before adopting a pet would be incredibly wasteful. Still, that's precisely what we do when forcing developers to write Kubernetes manifests. Developers shouldn't have to study ingress-controllers to deploy a Rails App, for example.

Kubernetes manifests are an unnecessarily detailed abstraction for most software developers. Developers don't need as much control. Therefore, they need higher-level abstractions, which hide more detail.

There's a huge mismatch between the amount of detail Kubernetes manifests demand and the amount of detail software developers need.

Notice it's not the abstraction itself that is problematic. It's the combination of who uses the abstraction and the amount of detail the abstraction leaks. SREs and DevOps folks, for example, do need more control over how traffic gets into the cluster and how pods talk to one another. In that case, lower-level abstractions are helpful.

The problem here is clear: developers don't need as much detail, but SREs and DevOps folks do. So how do you solve the role and abstraction mismatch without making developers slow or taking control off the hands of SREs and DevOps professionals?

Solving the role and abstraction mismatch

Using different abstractions for different teams is the only way to help developers ship faster without taking control off of infrastructure folks. Ideally, we should give higher-level abstractions to developers and lower-level abstractions to infrastructure professionals.

For that, we must work on defining a clear interface between these two groups.

Interface — Definition: an interface is the point at which two systems interact.

That's already what most companies try to do when they create platform engineering teams. Those teams are responsible for developing the platform on top of which developers will deploy applications.

As these teams work on those platforms, they expose APIs for developers to interface with the platform. Those APIs hide complexity from developers without taking control off the hands of SREs.

Platform teams tend to work well because an organization's systems tend to mimic the organization's structure itself. By separating the "platform" team from the "applications" team, there will be more friction for information to flow from one side to another. Therefore, the teams must define clear boundaries and interfaces to interact efficiently.

Once those interfaces are in place, infra people can keep control, while developers can keep things simple.

This strategy is known as "the inverse Conway maneuver". It consists of turning Conway's law on its head and mirroring the system's structure into the organization, and not the other way round.

The current problems with platform engineering

There are two problems with the current way companies do platform architecture.

The first problem is that each platform team is building its own custom APIs and proprietary implementations. There isn't a standard for an interface between developers and infrastructure, so platform engineers make the same flawed APIs and implementations over and over again, in-house, at multiple companies.

Consider React, for example. Before React came along, and pretty much we all agreed it would be "the standard," we kept reinventing the wheel and creating new JavaScript frameworks every week.

Now that there's a "standard," we can focus on building value on top of React instead of building frameworks to replace it. Additionally, engineers add value much more quickly because they've used React before and don't have to study some weird arcane framework created within the company.

The second problem with platform engineering teams is that the interfaces given to developers are usually at the wrong level of abstraction. These interfaces either expose too much or too little detail. In one first case, developers get excessive control at the expense of excessive complexity or too little control in the name of simplicity.

Interfaces are difficult to get right. Fortunately, we already have one. It's called docker-compose.yml, and we've been ignoring it for a while.

How docker-compose.yml "solves" platform engineering

The docker-compose.yml file "solves" platform engineering because it's a well-known "standard" written at the right level of abstraction.

Notice I'm not saying we should all be running docker-compose up in production. I'm just saying that we could reuse the same docker-compose.yml file that already works on our machine as an interface for specifying applications that run on other platforms, including Kubernetes.

Think about an ingress, for example.

When you want to get traffic into your cluster, you'll write a yml file describing your ingress resource. Then, you'll "submit" it to your cluster's API server using kubectl. Once the request with the manifest gets there, it'll be stored in etcd. Finally, your chosen implementation of an ingress controller will notice a new ingress definition and provision the cluster with something like an nginx configuration.

The beauty of Kubernetes' ingresses is that they're simply an interface. It's the ingress-controller implementation that does all the magic to actually "make the ingress happen."

Suppose you want to use Kong's API gateway instead of nginx. In that case, you could just replace your nginx-ingress-controller with Kong's API ingress-controller and you'd still be able to submit the same ingress manifest to your cluster.

In other words, ingress is just an interface, and there are multiple implementations of operators to handle it. Even though these operators will provision the cluster with different pieces of software (like kong or nginx), their input is the same: an ingress resource.

In the same way we have multiple implementations for operators that handle ingress resources, we could have multiple implementations for operators that handle a docker-compose representation.

Ideally, developers should be able to write a docker-compose.yml file and submit it to the "platform." Then, the platform team's operator would transform that docker-compose file's contents into actual pods, services, and ingresses, for example.

Thanks to that well-known interface, the compose file, we can more easily separate teams' responsibilities and design loosely coupled systems by taking advantage of Conway's law.

If multiple companies adopt docker-compose as a universal interface for platform engineering, we'll accelerate innovation. In that case, companies would focus on creating value instead of reinventing the wheel on their own, like they were doing with web frameworks before React came along.

In a world where docker-compose is the "standard" interface, companies can either implement their own "compose-controller" implementations or adopt open ones built by individual open-source wizards or pioneering open-source corporations.

Furthermore, extending docker-compose for esoteric use cases is not difficult either. For that, compose-controller implementations could simply leverage labels, similarly to what a Kubernetes ingress-controller would do with annotations.

Still, even labels wouldn't be necessary in most cases. The compose file standard is way more flexible than most people think and includes features like specifying replicas, resources, and update policies.

But why docker-compose.yml and not just anything else?

A few readers may think the same principle applies to any other standardized YAML format. For example, we could reinvent docker-compose.yml, call it super-application.yml and still implement the same controller pattern — like many people do with other types of custom resources. So why docker-compose and not something else?

Saying we should settle for docker-compose is like saying we should build more electric cars. We have already paved so many miles of highways that it's better to reuse the "car" abstraction and create an "electric" implementation than to create a new abstraction altogether.

Almost every developer already knows how to write a docker-compose file. Therefore, your engineers will carry less cognitive load and deliver more value more quickly.

Additionally, Docker's format succeeds at abstracting useless details away. For developers, docker-compose is simply a way to describe a bunch of containers that talk to one another, and that's as much as they need to know, most of the time.

Finally, developers can also run docker-compose.yml locally, thus diminishing the number of files developers have to deal with. Instead of having helm charts, manifests, and a thousand other configurations in their repos, developers can simply commit docker-compose.yml and rely on their platform teams to implement operators to handle it.

In this world, DevOps, GitOps, and all other trends remain valid and valuable.

Isn't that Docker Swarm?

No. Docker Swarm can use compose files as an interface but does not allow users to provide their own implementation. That way, even though programmers understand the abstractions well, those abstractions fail to expose as much detail as an infrastructure team needs.

In a way, Docker Swarm failed because it confounds an interface with an implementation. Compose files are an excellent interface built on sound abstractions. Still, binding that interface to one single implementation is a bad idea. Allowing users to bring their own implementation is much better.

That way, teams can pick the best implementations that suit their needs and fine-tune them for their organization's constraints without pushing complexity onto developers.

In that world, will platform engineers become "unemployed"?

Yes and no. In that world, there will be fewer platform engineers, but that's because those people will build value on top of these standardized APIs instead of rebuilding them repeatedly.

Think about how many engineers work on web frameworks today, for example. There aren't nearly as many as 10 or 20 years ago. That's because the engineers that used to build platforms started building value on top of existing ones — like React.

So, to summarise, decoupling interfaces from implementations will free engineers to build value instead of creating different versions of the same infrastructure many times.

How can I help build that future?

We're a two people startup pushing things that way.

At Ergomake, we're implementing ephemeral environments that use docker-compose as an interface. For now, they work only within our cluster, and we're productizing features on top of it.

Soon, we'll open-source the operator we've been building. Eventually, we want it to be everywhere, and we want it to be the standard for ephemeral environments so that folks can do more with it.

If you're interested in helping build that future or want to talk about it, either as an engineer, investor, or curious person, I'd love to chat. Please, book a slot with me here.

Alternatively, you can send me a tweet or DM @thewizardlucas or an email at

Share via
Staging environments for each pull request.