Blog

The Facebook Outage Wasn’t a DDoS attack, but it Shines the Light on Digital Resilience Planning

A10 Blog / Cloud & Networking / The Facebook Outage Wasn’t a DDoS attack, but it Shines the Light on Digital Resilience Planning

Paul Nicholson

October 5, 2021

We were reminded that digital resilience is core to a successful online presence with yesterday’s not subtle multi-property outage at Facebook, Instagram and WhatsApp and its cascading effect to other properties and plug-ins. This also provides a reminder to any organization including the service provider networks and the colocation, data center or hosting providers that increasingly house critical applications and infrastructure.

The outage lasted around six hours on Monday, 4 October and was also reported to have affected internal systems that appeared to be dependent on resources from the outage, including access to Facebook physical properties. Adam Mosseri of Instagram likened the inability to operate to a “snow day” for Facebook employees as they effectively couldn’t work.

The business impact was clear and showcases the effects of the lack of resilience. A few items reported included:

Morningstar reported Facebook stock fell 4.9 percent, losing over $40 billion in market cap
In the same article, this translated into losing $164,000 a minute in revenue
Brand and confidence loss (and competitors potential gain, e.g., Twitter)
DownDetector reported over 14 million problems reported due to cascading effects
The economic impact beyond Facebook itself is obviously far greater

Outages can be driven by many things. One of the initial conversations within A10 was whether this was a DDoS attack. It was also asked by external parties. The site was down, no response from the servers, not even a fail page, so it might have been. However, the A10 Security Research team saw no unusual activity from our honeypots, or other monitoring systems, but did note the DNS and BGP issues. This pointed to, and was confirmed late yesterday, that core infrastructure issues had caused the outage. Facebook said yesterday:

“Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.”

If you want to read more, ThousandEyes provided a technical article on the outage, covering the DNS and BGP details, while KrebsOnSecurity also offered a detailed summary.

Outages will happen, no matter how much we plan. It’s a fact of life IT professionals will always have to deal with. The challenge we face is how to mitigate that risk as much as possible and how we respond in times of crisis. While not all are specific to the Facebook outage, some best practices include:

Making the choices, and plans, to mitigate the biggest risks
Know what cyber security services, such as DDoS protection your data center provider offers
Have an internal plan on how and who to involve and notify, as well as an external communications plan
Develop fail-over plans for all infrastructure and eliminate single points of failover, for example, with global server load balancing and other techniques
Ensure security systems are in place, both monitoring for anomalies and mitigating nefarious activity
Eliminate chances for human error through automation and process-orientated checks and balances

The emphasis on digital resilience, both with technology and with planning, is becoming a bigger issue. And this is amplified by examples like the Facebook outage. It serves as a visceral reminder of the impact of downtime.

I am sure the internal Facebook team tasked with fixing this outage was not having a “snow day” yesterday.

Share this post:

Categories:

Cloud & Networking

Paul Nicholson

October 5, 2021

A10 Networks Expands its Cybersecurity Portfolio with Acquisition of ThreatX Protect

The State of Application Load Balancing in 2025

See the A10 Networks
Difference for Yourself

The Facebook Outage Wasn’t a DDoS attack, but it Shines the Light on Digital Resilience Planning

Share this post:

A10 Networks Expands its Cybersecurity Portfolio with Acquisition of ThreatX Protect

The State of Application Load Balancing in 2025

See the A10 Networks Difference for Yourself

The Facebook Outage Wasn’t a DDoS attack, but it Shines the Light on Digital Resilience Planning

Share this post:

Share this post:

Evaluating a TLS / SSL Decryption Solution

7 Strategies for CSO Cyber Security Survival

How cloud-ready and modernized are your application services?

Related Resources

See the A10 Networks
Difference for Yourself