Resolved -
The incident has been resolved.
Incident Summary:
As part of a planned security update at 00:00 BST on Monday 20th April, the Omnea Security team determined the need to conduct a Maintenance period at 00:30 BST. In order to keep this Maintenance window outside of Business hours, and due to an anticipated lack of impact on availability, a decision was made to schedule this immediately at 00:40 BST.
The Maintenance upgrade initially appeared successful and initial checks suggested normal operation. A set of manual tests carried out at 01:30 BST suggested the service was still accessible to users, however at this time there was propagating fault that would begin to affect an initially small subset of users depending on which servers their requests would be routed to.
One component of the change took significantly longer than expected to fully apply — and during this window, the platform continued to serve some traffic normally while other requests encountered errors, depending on which internal services handled the request. This partial state masked the underlying issue. When the delayed component eventually completed its shutdown, additional database connectivity was lost, causing broader service impact.
A subset of users therefore experienced intermittent errors and degraded performance. The issue was caused by this unexpected delay in how underlying infrastructure changes propagated, which created an unstable database connectivity state that was not immediately visible to our monitoring systems.
At 06:33 BST Omnea Support identified escalated error rates as a larger volume of users began to login, and alerted “Out of Hours” support. The root cause was identified at 06:58 BST, and the fix applied with full service restored at 07:04 BST.
Root Cause Analysis:
Networking route tables were removed and not recreated across the VPC, breaking VPC-x-VPC connectivity with Hasura Cloud gradually as instances lost connectivity to the Database.
Apr 20, 07:04 BST
Monitoring -
The environments seeing elevated login issues have returned to usual behaviour. We will update this log with a full root cause once reviewed.
Apr 20, 07:03 BST
Update -
We are continuing to investigate this issue. A possible root cause has been identified and a fix is being applied.
Apr 20, 06:58 BST
Investigating -
We’re aware of reports of login issues affecting some users and are currently investigating.
Apr 20, 06:33 BST