TOTW/5 – Avoid the Dreaded Death Spiral – with API Endpoint Separation and Versioning!!

“The best leaders anticipate future scenarios.” – a senior staff engineer

A Primer

Alright, let me talk about one of the most exciting scenarios I ran into at work. I was amazed at the the foresight and the forwards-thinking my staff engineer demonstrated in a single 30-minute meeting. He recognized how to avoid the meta-stable failure problem and the consequential run-time crashes that occur with high traffic loads.

During feature development, my junior engineer and I deliberated on how to incorporate a new business workflow into an existing API endpoint. The workflow entailed additional steps : local PostgresDB database write-IO operations , write-IO operations to a data storage tier held by another team, and complex business rule chains.

These changes entailed performance degradations across the board : CPU, Mem, and Disk utilizations. They also entailed latency and processing delays, with synchronous, blocking I/O operations to store to both data layers.

If a K8S pod had to manage both ( or egads, multiple ) API endpoints, than we could run into a death spiral esque issue.Let’s imagine we’re in the first state stable, where our traffic hasn’t hit high scale loads. Initially, the world seems nice.

But traffic load is increasing – we’re entering into a vulnerable state, and we’re dangling on a precipice. This state can happen in a thundering-herd scenario, where in customer requests would inundate a a K8S pod ( or other compute resources ). The pod would naturally die out, and requests would be forced to route to other pods. But then those pods would also shut down.

In a futile but bold attempt at system recovery, the cluster would attempt to spin up a new pod, but alas, there’s still to many requests – the new pods quickly die out. Now we entered Metastable failure state : a state whose recovery lies contingent on manual human intervention.

But there’d be benefits. We’d have two API endpoints – one for the version before the new feature development, and one after. And instead of having one K8S pod dedicated to servicing both API endpoints, we could introduce two distinct K8S clusters – each with their own pods – for each API endpoint. Let’s call them Cluster 1 ( for /v1 ) and Cluster 2 ( for /v2 ).

Resource Allocation for Cluster 1 ( /v1 ) can be a scalar multiplier less than resource allocation for cluster 2 ( /v2 ) , since /v2’s compute needs supersedes /v`1’s needs.

Benefits of API Endpoint Separation ( and Versioning )

Pre-empt cascading failures and scalability issues ahead of time
Follows best-industry practices
Single Responsibility Pattern : can have API endpoints represent different compute processes.
Minimize resource footprint
Allocate optimized resource groups tailored to endpoints – e.g. specialized K8S pods and separate auto-scaling actions for /v1 versus /v2

Cons of API Endpoint Seperation

Up-front investment time in creating separate code, mocks, tests, and deployment pipelines.
Extras configuration management

Footnotes

Names, PII, and Sensitive Data Elements, and other corporate details have been redacted or modified to respect journalistic integrity and prevent the disclosure of highly-sensitive work information.
Shoutout to the rock star engineers at my company for working through this distributed systems topic!

harisrid Tech News

recent posts

about