Why bank outages are now a systemic business risk – not just an IT failure

admin

5 months ago

Jonathan Dedman, Director at Cloudhouse

In 2025, the disruption caused by bank outages was starkly illustrated. In January, a Barclays IT outage quickly escalated into a full blown crisis, with essential payments being blocked: customers were stranded at supermarkets, tax deadlines were in limbo, and there was even a family trapped outside their new home.

Then, in March, statistics revealing the true extent of the issue were released. A Treasury Committee investigation uncovered that major UK banks accumulated at least 803 hours – more than 33 days’ worth – “of unplanned tech and systems outages in the last two years [2023-2025]”. This was a sector-wide challenge.

The rest of the year seemed dominated by other major outages that showed this is also a modern-day challenge felt across sectors. But in November, an indicator into the monetary costs of the bank outages was released. A s u rvey found UK investment banks lost £2.4m in 12 months, with outages lasting over an hour on average and costing the affected bank £600,000.

The picture of seeing bank outages as ‘IT failures’ has been repainted as one of systemic business risks. The chaos caused by a single outage alone can be vast. But what are the underlying causes behind these outages? Why are they so common? And how can they be mitigated?

The underlying reasons behind bank outages

To conduct their investigation, the committee sought answers from banks for the causes behind their IT outages. In response, Barclays UK’s CEO, Vim Maru, stated that their “January outage was caused by a software problem with their system, and the incident was not due to a cyber-attack ‘or any other malicious activity’”.

“A software problem” is rather vague. But there are several industry patterns illustrating systemic risks that indicate what the causes behind the outage could’ve been. Large banks like Barclays often depend on outdated systems running on decades-old infrastructure and codebases. Therefore, integrating newer technologies with these systems, even if it’s only a routine upgrade, can cause severe failures. In the case of Barclays, HMRC’s tax deadlines could have led to a surge in transactions that surpassed the peak load the already brittle systems could manage.

Driven by regulatory demands, consumer expectations and an intensely competitive market, there is a pressure to rapidly innovate and implement changes quickly, increasing the likelihood of such failures taking place. Concurrently, however, not introducing upgrades means banks hold on to their existing systems and accumulate technical debt, leading to an increased risk of outages and threats from cyberattacks.

The problem also extends beyond internally-managed systems. Critical system functions are often outsourced to third-party vendors. Many institutions can become so dependent on these partners they get vendor lock-in and are therefore vulnerable to any risks contained in these external technologies. The AWS outage last year showed just how dependent companies across sectors are on major cloud providers.

How to responsibly accelerate modernisation

The scale and frequency of outages has created an imperative for finance institutions to accelerate modernisation. Specifically, banks need to make the transition from older architectures with monolithic mainframes at their core to dynamic, flexible and scalable cloud-native architectures.But how do you integrate changes without a major system overhaul or risking system failure?

One of the most effective ways banks can achieve this is to migrate existing applications onto modern and supported operating systems without changing the code or behaviour of the applications themselves. To do this, they need to use specialist migration software that can isolate the OS-specific dependencies these existing apps contain. Once redeployed on the new environment, the apps can be maintained, secured, made available to receive updates, and connect with newer tools.

Crucially, organisations can avoid vendor lock-in, as apps are still managed internally and can leverage the capabilities of different partners/providers. Systems should also be constantly stress-tested for peaks in activity and monitored for any configuration changes with the use of automation tools. Barclays, for example, could simulate tax deadlines to spot load thresholds and then make the necessary adjustments.

Cross-sector operational resilience and transparency

Operational resilience has moved from a desire and objective to a compliance obligation. DORA, for example, was introduced last year by the EU and makes operational resilience testing a mandatory requirement for financial institutions. It goes beyond internal systems alone and also requires a continuous audit of the resilience of all third-party suppliers and vendors. But operational resilience must go hand in hand with transparency.

Outages are a sector-wide issue. So, instead of masking the reasons behind them, there needs to be far more transparency from institutions around the secrecy of systems. Barclays has yet to publicly disclose a full, detailed root cause analysis (RCA) for its outage in the manner that some industry experts advocated. If we look elsewhere, CrowdStrike’s detailed RCA set a benchmark for how to respond, as did the British Library’s review after its cyber attack and, more recently, AWS’.

Hiding failures due to embarrassment actually increases the risk of future outages for both the bank in question and the sector at large. But when institutions share the lessons learned, they benefit the sector as a whole. Such approaches help to rebuild trust and prevent future incidents.

Will lessons be learnt in 2026?

An outage doesn’t just lead to major losses for the bank, but it substantially impacts businesses and people across the country as well. The real-world examples and financial impact of incidents last year show outages represent a systemic risk to financial institutions and can’t be brushed off as IT failures. Decades-old infrastructure, rushed modernisation and vendor dependency across the sector all heighten the risk of more outages.

By seeing IT outages as business risks instead of IT failures, firms are more likely to implement an ongoing strategy that introduces small and incremental changes over time that maintain operational resilience. Specialist migration software can modernise and monitor existing applications, while prioritising operational resilience and transparency over incidents can build a more robust sector.

Ultimately, learning and sharing lessons from outages will be a critical step to preventing them this year.