Fastly failure: Why a slew of major websites went offline

The hourlong Fastly Inc. outage was a reminder of how exposed the world’s biggest websites are to the impact of disruptions ranging from simple human error to coordinated cyberattack.

The failure at Fastly, which helps websites load their pages faster, sent vast swaths of the web offline on Tuesday. News websites including CNN, The New York Times and Bloomberg News, services such as Shopify Inc. and Stripe Inc., plus sites as large as Spotify and Reddit all went offline. U.K. government digital services were also unavailable for a period.

Fastly quickly identified an issue with its content delivery network and announced that it was rolling out a fix just 46 minutes after acknowledging there was a problem. Sites began to spring back to life soon afterward.

Nevertheless, the cascade of failures across the web turned a mere “service configuration” into a global outage that hit large companies and small users alike.

What does Fastly actually do?

Fastly is one of a number of high-level website and application hosting services that large enterprises use to serve content to millions of users simultaneously.

Rather than hosting all website content on a single set of servers in one location, Fastly’s so-called “edge computing” model puts servers in dozens of locations, allowing websites to serve pages to users from physical locations closest to them. This cuts lag time, speeding up page-loading and spreading the burden on individual servers.

These vast and complex setups are run by just a few companies, such as Fastly, Cloudflare Inc. and Akamai Technologies Inc. The global edge computing market was valued at $4.68 billion in 2020 and is expected to expand at a compound annual growth rate of 38.4% from 2021 to 2028, according to a recent analysis by Grand View Research.

While these setups usually work perfectly, their complexity means that even a simple error in a configuration file can trigger chain reactions of outages. For users, most of whom rarely need to think about how the internet works, that can come as a shock.

“People believe that somehow things don’t break. At the end of the day it’s a computer sitting in a server room with different components that can malfunction,” said Mehdi Daoudi, co-founder and chief executive officer of Catchpoint, a technology platform that monitors website performance.

“The way networks are built, an outage can quickly cascade. It’s a domino effect.”

How have such services evolved over time?

It wasn’t always this way: In earlier iterations of the internet, a basic website consisted of a few pages of text and accompanying images, all of which lived on a single web server with an IP address all to itself. To access that site, an internet service provider directed a user request to that specific computer.

That setup still works, but the rapid, exponential increase in digital content makes delivering it vastly more complicated for large businesses. Research published by analyst group IDC last May suggested more data will be generated in the next three years than was collectively over the past three decades.

Digital content today lives on multiple identical servers dotted all around the world; some are basic, designed to serve up static content such as text, while others are packed with solid-state hard drives to pump out video files, or filled with fast memory to maintain live conference calls to hundreds of participants.

The biggest content providers, such as Netflix Inc., connect their servers directly to those of an ISP to reduce the demand placed on networks, or install their servers within another network operator’s infrastructure.

Content distribution networks began taking shape in the 1990s as the internet outgrew its early infrastructure.

“They solved two problems: capacity and performance. But they’re not perfect. Today it was Fastly, but these outages can happen to anyone,” said Daoudi.

Large websites are kept online by experienced system administrators. While they know that occasional outages are inevitable, and rarely last more than a few minutes, failures that take globally renowned websites offline never go unnoticed and cause a stir on social media.

But the short-term chaos online — which can result in furious tweets, failed transactions or canceled subscriptions — is often worse than the longer-term impact. Even outages that last longer, for several hours or more, are so uncommon that their business fallout is considered minor.

While Fastly is one of only a few companies that provide this service, many investors turned on the stock after the company last year lost its largest customer, the Chinese internet giant and TikTok owner ByteDance Ltd. After rising 350% in the second half of last year alone, the shares are down more than 40% this year. The stock gained Tuesday after service was restored.

Was this a hack?

There is no evidence to suggest Fastly’s issues on Tuesday were the result of a malicious cyberattack. But widespread outages are often the result of hackers, and are not always the fault of the companies hosting content.

For instance, in 2016, millions of internet users lost access to some of the world’s most popular websites after hackers compromised domain name system service provider Dyn Inc.. That knocked offline sites including Twitter, Spotify, Reddit, CNN, Etsy and The New York Times.

Users often see no difference between a distributed-denial-of-service attack or a content delivery network failure. Each can mean the user sees a “server not found” error or a blank page, leaving them unable to access the site. More malicious hacks hijack websites in an attempt to extort users with ransomware.

In a time of both misinformation and too much information, quality journalism is more crucial than ever.
By subscribing, you can help us get the story right.

SUBSCRIBE NOW

PHOTO GALLERY (CLICK TO ENLARGE)

Comments (0)
Add Comment