Production Issue Summary
In certain cases, a protocol switch triggers a database error (timelock error), which prevents the switch from being successfully applied. For example, organisations attempted to change a program frequency, but this change did not reflect at the patient level. There was limited communication with customers regarding the specific issue. Initially, the issue was treated as general performance degradation. It was not perceived as a platform-wide issue, but rather one that affected specific customers.
Timeline of Events
-
First reported: January
-
Most recent related fix deployed: 2 weeks ago
-
Lead time (time to identify): 2 hours
-
Workaround implementation time: 1 hour
-
Time to full resolution: 1 working day
SLA was met
Impact
Scope:
-
Time period affected: since protocol switching was introduced, though only recently observed under higher system load
-
Number of organizations affected: N/A
-
Number of programs affected per organization: N/A
-
Number of individual patients affected per program: none — manual recalculation was used to prevent impact
Clinical Severity:
-
Total number of missed alerts per patient: N/A
-
Number of patients who were not monitored by a healthcare professional during the issue: N/A
Classification: Blocker
Workaround
A daily check was implemented to identify occurrences of the issue. For affected patients, a manual recalculation was triggered to correct the data.
Cause
-
System performance degradation
-
Increased usage and system demand
-
The system was not designed for the current level of load
-
Protocol recalculation logic was not optimized for high-load situations
Solution
Optimisation of protocol recalculation processes to handle increased system load more effectively.
Communication and Documentation
External
-
Customers who reached out were informed of the issue and the fix
-
General performance degradation noted on InStatus (status.luscii.com)
-
One complaint received that has been handled
Internal
-
Communication shared with Platform Circle, Patient Circle, Support team and in the performance improvement Slack channel
Improvements
Product
-
Inform users when asynchronous processes are initiated so they understand that changes may not be immediately visible: CAPA project: clarity on async processes for users
-
Implement event logging for the status of asynchronous processes
-
Continue stress and performance testing (ongoing CAPA projects)
Communication
-
Create separate incident tickets even if the root cause appears to be the same
-
Ensure the Customer Liaison is always involved in relevant issues
-
Be more specific and transparent in external communications