A bug in Fastly’s software caused yesterday’s Global Internet Outage, that put a lot of famous sites out of commission, the company said.
A Bug in Software Caused Global Internet Outage
Fastly’s issue took out high traffic sites from a variety of backgrounds. These sites included news portals like The Guardian and New York Times, some British government sites, social platforms like Reddit and Stackoverflow, and e-commerce sites like Amazon.com.
Nick Rockwell, Senior Engineering and Infrastructure Executive at Fastly, Inc, said “this outage was broad and severe, and we’re truly sorry for the impact to our customers and everyone who relies on them”. He also said that the company should have anticipated the problem beforehand. Incidentally, Fastly owns a few strategically placed servers across the globe. The company’s servers support a large number of some very famous websites.
In the blog post, the company also shared a timeline of events. Further, it promised to examine and explain why Fastly had failed to detect the software bug during its own testing process.
The CDN service provider said that a customer changing their settings triggered the bug. It said the bug was in a software update shipped to customers on May 12. When the said customer changed their settings, 85% of Fastly’s network returned errors. To their credit, Fastly noted the outage within a minute, at 0947 GMT. Moreover, within 45 minutes, engineers had worked out the cause.
The company’s network fast recovered after the engineers pinpointed the cause and disabled the settings that triggered the problem. “Within 49 minutes, 95% of our network was operating as normal,” the company said.
Fastly added that the networks fully recovered by 1235 GMT. By the end of the day, it rolled out a permanent software fix.
Tuesday’s outage has showed the risks of relying on a handful of companies for critical internet infrastructure.