From Signal to Smart Beds: How AWS Outage Disrupted Global Services



Amazon has officially revealed the cause of the recent hours-long AWS (Amazon Web Services) outage that disrupted thousands of websites and applications around the world. The company confirmed that the global disruption was triggered by a bug in its automation software, which caused a major failure in AWS’s internal systems.

According to a detailed report published on Thursday, the problem started with a hidden defect in DynamoDB’s automated DNS (Domain Name System) management system. DynamoDB, one of AWS’s key database services, manages hundreds of thousands of DNS records to keep online services running smoothly. However, a malfunction created an empty DNS record in the Virginia-based “US-East-1” data center. Normally, AWS’s automated tools would detect and repair such issues, but this time, the system failed to do so - requiring manual intervention from AWS engineers to fix the problem.

AWS confirmed that it has temporarily disabled its DNS Planner and DNS Enactor systems worldwide and is working to strengthen the automation process to prevent such incidents in the future.

The outage had widespread effects. Major platforms such as Signal, Snapchat, Roblox, Duolingo, and Ring were among more than 2,000 services affected. According to the monitoring site Downdetector, over 8.1 million users globally reported problems accessing apps, websites, or online services during the downtime.

Even smart devices were not spared. Users of Eight Sleep, a smart bed company that connects to the internet, were unable to control their bed’s temperature or incline through the mobile app during the outage. The company’s CEO, Matteo Franceschetti, apologized to customers and released an update allowing critical bed functions to be controlled via Bluetooth in case of future network disruptions.

Experts say the incident serves as a warning about the world’s growing reliance on a few massive cloud providers. Dr. Suelette Dreyfus, a computing and information systems lecturer at the University of Melbourne, noted that the outage highlights the risks of centralization.
“The internet was originally designed to be resilient, with multiple pathways to reroute traffic,” she explained. “But we’ve lost some of that resilience by depending so heavily on just a few giant tech companies for data storage and essential services.”

Previous Post Next Post

Contact Form