Back in December, O2, the Telefonica operating company in the United Kingdom, had a massive outage. According to O2, about 25 million subscribers were affected by a small software issue. The company will need to reimburse customers for the outage, which lasted an entire day. An article in The Independent states, O2 announced “Pay monthly customers will be given two free days on their contract. And pay as you go customers will get 10 percent of their credit for free.”
The outage had an impact far beyond just O2 mobile subscribers: Softbank Trading, Tesco Mobile, O2 wholesale customers, public transit, including train and bus scheduling and routing -- all impacted businesses across the globe. For O2, a company whose number one objective is to provide fast and reliable connectivity, it was an obvious crisis.
O2 later announced the outage was caused by an Ericsson software upgrade. But where was the resiliency? Being able to spot and rectify issues and outages like this one is business-critical in today’s world. So how did this happen?
In large network environments, issues arise everyday which often go unnoticed by consumers. And, network operations teams have well-documented processes to work through and resolve these issues as they unfold. What are the steps to resolving issues quickly?
- Identify the outage impact
Understand and prioritise the outage based on the customer and financial impact to the business. When outages affect customers or services, there is usually a financial impact to the business. Understanding that impact helps to determine which issue to fix first.
- Resolve the short-term problem
Figuring out a short-term fix for a solution requires real-time visibility to quickly identify outages or issues across the entire end-to-end network environment.
- Ensure the problem does not happen again
New policies and practices ensure outages or issues do not occur again.
As we move to 5G and networks become faster and more policy driven, what solutions and processes still need to be put in place to prevent outages like this in the future?
The 5G network is likely to have more in common with Enterprise Application environments than traditional networks. This is driven by widespread virtualization and the use of container technologies in delivering network functions. This, in turn, allows the use of “DevOps” principles in the design and deployment of 5G. The continuous integration and testing enabled by DevOps has the potential to reduce outages like the one O2 experienced.
That said, the frequency of software changes in a 5G environment is likely to be higher and will require speed at scale of collection to ensure 5G networks are always up and running.
To learn more about 5G management and monitoring read SevOne’s whitepaper Keys to 5G Success for Carriers and Service Providers.