Microservices Were a Mistake for ML Systems

The industry cargo-culted microservice architecture into ML platforms and created distributed systems nightmares. Monoliths are the answer.

Microservices Were a Mistake for ML Systems

How We Got Here

Around 2020, someone at a conference said "ML systems should be microservices" and the entire industry nodded along without thinking. The logic seemed sound: separate services for feature computation, model training, inference, and monitoring. Clean boundaries. Independent scaling. DevOps best practices.

It was a disaster.

I've spent the last two years migrating ML systems from microservice architectures back to monoliths at three different companies. Every time, the team's velocity increased 3-5x and infrastructure costs dropped 40-60%.

ML Microservices Nightmare

Why Microservices Fail for ML

ML workloads are fundamentally different from web services:

  1. Data locality matters. ML operations are data-intensive. Shipping gigabytes of feature data across network boundaries for every inference call is insane.
  2. Tight coupling is inherent. Your feature computation, model, and post-processing are intimately coupled. Pretending they're independent services doesn't make them so.
  3. Debugging distributed inference is a nightmare. When your model output is wrong, is it the feature service? The serialization? The model? The post-processing? With microservices, answering this takes hours. With a monolith, it takes minutes.
  4. Cold start kills latency. Kubernetes pods spinning up separate inference containers adds 5-30 seconds of latency that no user will tolerate.

The Majestic ML Monolith

Here's what a well-designed ML monolith looks like:

  • One service that handles feature computation, inference, and post-processing
  • Horizontal scaling at the service level (not the component level)
  • Model files loaded at startup, hot-swapped in memory
  • Feature computation done in-process with vectorized operations

It's boring. It's simple. It works. And your team can actually debug it without a PhD in distributed systems.

Related Articles