Why The Internet Keeps Going Down Frequently, Blocking Access To Your Favourite Websites
Earlier yesterday, a large chunk of the Internet went down, including the likes of Facebook,, Instagram, and WhatsApp. They weren¡¯t even offline, they just had problems loading anything in their interfaces. The reason though is a little complex.
In case you didn't notice, a large chunk of the Internet went down, including the likes of Facebook, Instagram, and WhatsApp. They weren't even offline, they just had problems loading anything in their interfaces.
The reason why Facebook, Instagram and WhatsApp went down recently though is a little more complex than you'd think. Because there could be more than one culprit.
There could be many reasons why websites go down, and all from different sources. Content delivery network (CDN) systems, web hosting providers, even the very backbone of the Internet can act up and wreak havoc.
Reuters
Let's try to understand how or why Internet websites and services go down. CDNs are almost always one of the first culprits to be blamed, and one of those CDNs is Cloudflare.
What is Cloudflare?
Cloudflare is a US company that offers services to websites, things like content delivery networks, DDoS protection, online security and distributed domain name server services. They're basically the middleman between Internet users like you, and the hosting providers of companies like Facebook.
According to Cloudflare's own reports, they had about 12 million websites as customers in 2017, with about 20,000 new ones joining everyday. So when they go down, everyone does.
So what happened?
And that's exactly what happened a couple of days ago; Cloudflare went down. It wasn't even an attack this time, they just messed up. You see, they decided to push an update to their Web Application Firewall (WAF), which helps protect websites. They don't usually do this in an isolated test environment, but rather just deploy in a test mode, so it shouldn't be a problem.
Down Detector
However, one of the new rules they were adding had a glitch that caused the CPU to spike to 100 percent. The second problem was that the test was deployed to the whole world instead of a few users they usually try this out with. So the usage on all their machines worldwide spiked, and that caused the "502 Bad Gateway" errors people were seeing on websites they visited.
It took 20 minutes for Cloudflare to just figure out what went wrong, and another half hour or so more until they could roll back the update. And yet hours later yesterday, websites were still trying to get back up on their feet.
But there's a bigger problem at work here
This isn't the first time something like this has happened, and it won't be the last. That's because of something called the Border Gateway Protocol (BGP).
You see, because the Internet is not a centralized database, computers need to have a way to connect to various websites and services around the world. Basically, data needs to flow through without a single controlling entity. BGP is the neutral traffic cop enabling that, whether you're sending an email, loading a website, or browsing Facebook. And when it messes up, everyone is affected.
Just last week, Verizon took down a large portion of the Internet when it accidentally messed the BGP. Essentially, they accidentally made a small company in Northern Pennsylvania a preferred path of many Internet routes. That's the equivalent of Uber telling all its drivers on a major freeway that their best route to any destination is through one particular market gully. As you can guess, no one could get anywhere.
Similarly, back in November last year, Google also suffered a major outage linked to a BGP reroute. And though this one wasn't officially considered a malicious hijack, it was suspicious because China Telecom improperly and Russia-based Transtelecom were the first to accept the wrongly-declared routes, effectively sending a large chunk of Google-related traffic through those countries.
BGP has also been directly hijacked too. Back in 2014, a hacker, rerouted the traffic on 51 networks of 19 ISPs. He was basically redirecting cryptocurrency miners to his own controlled mining pool, effectively siphoning the profits their PCs were working to collect. And there have been many more cases too.
OK, so what is BGP then?
BGP was originally conceived in 1989 on two napkins, by Kirk Lougheed of Cisco and Yakov Rekhter of IBM over lunch at an Internet engineering conference. And though it was implemented in 1994, over 25 years ago, the version of BGP we use today has remained largely unchanged.
Reuters
The protocol works the way GPS does for us on the roads. It's a map that lets our computers transfer data across the Internet, which is essentially just a large network of networks. Each of these networks are run by various industrial nodes at ISP, which each control a set of IP addresses and routes. They have to "announce" these routes to the world so Internet traffic can flow through.
Think of when you're moving to a new city and starting a job the very next day. You probably want to plan a map route before hand, based on where you live and where your office is. But if your GPS gives you the wrong directions, you might end up at the mall, or a dead end instead of your workplace, wondering what the hell happened.
Why does BGP suck so much?
The problem with using this 30-year-old protocol? It relies on trust. BGP was never designed to independently verify the routes claimed by individual networks, it doesn't have any sort of co-mingled pass-phrases like encryption does. So if these systems accidentally announced bad routes or, worse, are hijacked to intentionally do this, it has no way of knowing.
It would be like if someone was holding people hostage at an air traffic control tower, and forcing them to misdirect traffic. The pilots wouldn't know until they're seconds away from a head-on collision. And BGP wasn't designed to stop any of this.
Factors other than BGP may knock off Internet access too
But even single companies screwing up can affect vast swathes of the Internet. Just last month, the us-east1 region of the Google Cloud Platform went offline. They weren't hacked either, and this wasn't even a software event. Instead, a maintenance event caused some fiber bundles linking their various cloud servers to be physically damaged. In order to bypass that, Google had to reroute some of its cloud traffic, creating a gridlock with increased latency for users, and all its sites using the platform. That meant YouTube, Shopify, Snapchat, and many more.
That's a problem because Google is one of the top cloud hosting providers in the world, alongside others like Amazon Web Services (AWS), and Microsoft Azure. So when one of these cloud hosting services fail, all their clients are dragged down with them. For instance, AWS supports, the likes of Lyft, Airbnb, Comcast, Vodafone, and of course Netflix. Azure meanwhile is behind services like eBay, Flipkart, and the Bank of America.
We gotta be doing something about it right?
There are thousands of these BGP routing accidents each year, most of them accidental and minor in effect. But there are plenty of the malicious and majorly disruptive kind too. But though governments have known about this for decades, we've made little progress on a fi, despite how much of a national security issue it is.
Facebook connections
After all, a compromised Border Gateway Protocol makes it easy to route an entire nation's traffic through a different country, or even take down their Internet access entirely.
We're making a little headway, though it's not nearly fast enough. Thanks to BGP-targeting hackers renewing focus on the issue, a consortium of network operators have been working with the Internet Society since 2014 to codify and promote a "BGP best practices". An international committee of US and UK government officials and Internet experts have also been researching an anti-hijack defense framework for BGP, published last year.
The problem? No matter how sound these new systems may be, it's not easy to get every ISP to implement them. And even one weak link breaks the chain, and the Internet suffers from an outage.