It was a cable break, but not in the way you’d expect. Service provider notifications brought it to the attention of Daily Maverick’s news desk, and then some excellent Downdetector sleuthing by MyBroadBand confirmed it.
This time, the culprit wasn’t a wayward ship anchor snagging a subsea cable somewhere in the Indian Ocean, or even an underwater earthquake and rock slide.
Read more: Tracking Cybersmart’s 44-hour nationwide outage
The break was terrestrial and came as a devastating duo, severing the fibre connections between Teraco’s (the biggest data centre operator in the country) primary Cape Town data centres.
When redundancy becomes a single point of failure
The drama began at 3.27am when Teraco detected the first fibre break in Elsies River. Their systems did exactly what they’re supposed to do: automatically failed over to the secondary route. Problem solved, or so they thought.
For seven hours, South Africa’s internet traffic hummed along on a data highway, what engineers call an “N-1 state” – one route down, one route operational. Not ideal, but manageable. Nobody noticed because nobody needed to.
Then, at 10.29am, the second shoe dropped. An unrelated fibre break on Settlers Way in Mowbray severed the backup route. Suddenly, Teraco’s CT1 data centre in Rondebosch and its CT2 facility in Brackenfell (two critical nodes in South Africa’s internet infrastructure) were completely cut off from each other.
A cable fault enters the chat
Seven minutes earlier, at 10.22am, Seacom’s network operations centre began detecting what they would call a “dual outage”.
By 10.30am, Downdetector was lighting up with reports. Discord went dark. OpenAI stopped responding. YouTube buffered endlessly. The internet, for a significant chunk of local users, had effectively turned into a dial-up experience.
“Our warehouse said they were down, but I thought you had a lot on your plate and then didn’t bother you,” my wife said on a call later in the afternoon – she was also unaffected on her mobile data.
In a statement shared with Daily Maverick, Teraco confirmed the timeline and apologised for the disruption. “Maintaining network resilience and reliability for our clients is our top priority,” said Carla Sanderson, head of marketing at Teraco Data Environments.
“Our teams have been working around the clock to restore full service. We appreciate the patience and understanding of our clients and partners as we finalise repairs.”
The statement emphasised what the company characterised as the improbability of the event: “Teraco has diverse fibre routes connecting its CT1 and CT2 data centres. Both routes (were) impacted on the same day by independent events.”
Independent events. Same day. Same outcome. The kind of coincidence that makes network engineers wake up in a cold sweat.
The vendor diversity trap
Here’s where the story gets interesting for the network architecture nerds (look, I just installed a new UniFi access point on my home network, so I count myself as one).
Seacom, a major fibre and subsea cable operator, had done what you’re supposed to do: purchased diverse routes for redundancy. Two separate fibre paths, different physical routes, proper disaster planning.
The catch? Both routes were leased from the same vendor – Teraco.
When both routes failed simultaneously, Seacom’s carefully constructed redundancy collapsed. Their high-value Layer 2 services connecting Isando to Brackenfell, Isando to Rondebosch, and crucially, Rondebosch to Brackenfell were all degraded.
Read more: EXCLUSIVE: Cybersmart founder owns the internet outage
Traffic had to be rerouted through available backbone capacity, resulting in what Seacom delicately described to MyBroadBand as “degraded service due to network congestion”.
Some engineers are now calling what happened “digital load shedding”.
Teraco managed to restore the primary Elsies River route by 2.30pm, ending the immediate crisis. But the incident has prompted a strategic rethink. The company announced it would implement a third diverse fibre route between its Cape Town sites “to provide additional resilience and diversity” — essentially admitting that two routes, no matter how diverse, aren’t enough.
What have we learnt from this:
In May, Cybersmart’s network collapsed not because of a physical cable break, but because old Cisco 6500 routers couldn’t handle a global routing table explosion – that was a Layer 3 logical failure.
Today’s outage was a Layer 1 physical failure. Two different problems, same result: large chunks of the internet going offline.
The subsea cables that everyone obsesses over? Often not the weakest link. It’s the terrestrial backhaul, the unglamorous fibre routes connecting data centres on land, that frequently becomes the chokepoint.
When these inter-data centre links fail, they expose what network engineers call “layered dependency traps”: sophisticated services running on top of infrastructure owned by a single vendor that, despite best efforts, can’t always protect both diverse paths.
The outage for thee, not for me
While Seacom customers were experiencing the joy of buffering wheels and timeout errors, Openserve customers (like me) carried on as if nothing had happened.
Why? Openserve and Seacom are direct infrastructure competitors. Openserve doesn’t use Seacom’s terrestrial backhaul or West Coast routes. It operates its own massive national fibre network, routes traffic through its own cable landing stations, and most critically, owns capacity on multiple competing subsea cables on both coasts.
An example is Openserve’s dedicated fibre pair on Google’s Equiano cable, a state-of-the-art subsea system with 12 terabits per second of capacity landing at Melkbosstrand.
Add to that stakes in WACS and SAT-3/SAFE on the West Coast, and EASSy on the East Coast, and you have what amounts to a digital load shedding buffer, enough redundant capacity that losing an entire route doesn’t degrade service.
The expensive lesson
True network resilience is phenomenally expensive. It requires owning physical infrastructure, maintaining diverse vendor relationships and building enough redundant capacity that most of it sits idle most of the time, waiting for a day like today.
Seacom has learnt that path diversity and vendor diversity are not the same thing. Teraco has learnt that even diverse routes can fail simultaneously when they’re in the same geographic area.
And South African internet users have learnt, once again, that the invisible infrastructure holding up our digital economy is both more complex and more fragile than we’d like to believe. DM
A Teraco and Seacom outage took down critical South African internet service on Tuesday. (Photo: iStock)