How We Got Here
Around 2020, someone at a conference said "ML systems should be microservices" and the entire industry nodded along without thinking. The logic seemed sound: separate services for feature computation, model training, inference, and monitoring. Clean boundaries. Independent scaling. DevOps best practices.
It was a disaster.
I've spent the last two years migrating ML systems from microservice architectures back to monoliths at three different companies. Every time, the team's velocity increased 3-5x and infrastructure costs dropped 40-60%.
ML Microservices Nightmare
Why Microservices Fail for ML
ML workloads are fundamentally different from web services:
- Data locality matters. ML operations are data-intensive. Shipping gigabytes of feature data across network boundaries for every inference call is insane.
- Tight coupling is inherent. Your feature computation, model, and post-processing are intimately coupled. Pretending they're independent services doesn't make them so.
- Debugging distributed inference is a nightmare. When your model output is wrong, is it the feature service? The serialization? The model? The post-processing? With microservices, answering this takes hours. With a monolith, it takes minutes.
- Cold start kills latency. Kubernetes pods spinning up separate inference containers adds 5-30 seconds of latency that no user will tolerate.
The Majestic ML Monolith
Here's what a well-designed ML monolith looks like:
- One service that handles feature computation, inference, and post-processing
- Horizontal scaling at the service level (not the component level)
- Model files loaded at startup, hot-swapped in memory
- Feature computation done in-process with vectorized operations
It's boring. It's simple. It works. And your team can actually debug it without a PhD in distributed systems.
