Post-Mortems on production issues

Workflow changes/protocol changes get timelock error and not happening

Geschreven door Daan Klomp | 2-jun-2025 13:15:18
Production Issue Summary

In certain cases, a protocol switch triggers a database error (timelock error), which prevents the switch from being successfully applied. For example, organisations attempted to change a program frequency, but this change did not reflect at the patient level. There was limited communication with customers regarding the specific issue. Initially, the issue was treated as general performance degradation. It was not perceived as a platform-wide issue, but rather one that affected specific customers.

 

 

 

Timeline of Events
  • First reported: January

  • Most recent related fix deployed: 2 weeks ago

  • Lead time (time to identify): 2 hours

  • Workaround implementation time: 1 hour

  • Time to full resolution: 1 working day

SLA was met

 

 

Impact

Scope:

 

  • Time period affected: since protocol switching was introduced, though only recently observed under higher system load

  • Number of organizations affected: N/A

  • Number of programs affected per organization: N/A

  • Number of individual patients affected per program: none — manual recalculation was used to prevent impact

Clinical Severity:

 

  • Total number of missed alerts per patient: N/A

  • Number of patients who were not monitored by a healthcare professional during the issue: N/A

Classification: Blocker

 

 

 

Workaround

A daily check was implemented to identify occurrences of the issue. For affected patients, a manual recalculation was triggered to correct the data.

 

 

 

Cause
  • System performance degradation

  • Increased usage and system demand

  • The system was not designed for the current level of load

  • Protocol recalculation logic was not optimized for high-load situations

 

 

 

Solution

Optimisation of protocol recalculation processes to handle increased system load more effectively.

 

 

 

Communication and Documentation

External

 

  • Customers who reached out were informed of the issue and the fix

  • General performance degradation noted on InStatus (status.luscii.com)

  • One complaint received that has been handled

Internal

 

  • Communication shared with Platform Circle, Patient Circle, Support team and in the performance improvement Slack channel

 

 

Improvements

Product

 

  • Inform users when asynchronous processes are initiated so they understand that changes may not be immediately visible: CAPA project: clarity on async processes for users

  • Implement event logging for the status of asynchronous processes

  • Continue stress and performance testing (ongoing CAPA projects)

Communication

 

  • Create separate incident tickets even if the root cause appears to be the same

  • Ensure the Customer Liaison is always involved in relevant issues

  • Be more specific and transparent in external communications