A retail platform needed to leave an overpriced region before contract renewal. Leadership wanted a single Saturday cutover; engineering pushed back with a phased plan instead.
Phase one: stateless frontends
They duplicated load balancers and autoscaling groups in the target region, lowered DNS TTLs a week ahead, and shifted 10% of traffic with weighted records. Errors were visible within minutes — not after the whole stack had moved.
Phase two: data paths
Stateful services stayed read-replicated for two weeks. Writes still hit the old region until replication lag stayed under an agreed threshold for seven consecutive days.
What almost went wrong
A hard-coded region name in a background job sent exports to the wrong bucket for a weekend. The job had never been on the traffic map because it was "internal only."
Takeaway: migrations are sequencing problems. Dates on a slide are less useful than ordered traffic shifts and honest TTL planning.