Production Issue Summary
We have discovered that since September 2023 a total of 0.05% of active patients were not synced correctly with the Clinical Engine. This caused the clinical engine to not generate alerts for these patients although the patients could do measurements regularly.
Timeline of Events
Lead time: 0.75h
Work around time: N/A
Correction time: 3h
Impact
Scope:
- Period during which the issue occurred : September 2023 to April 2024
- Number of organisations affected: 7
Clinical severity:
- Number of patients with alerts missed, not surveilled by a hcp during the issue period: 15
Classification: Blocker (highest)
Workaround
N/A
Cause
Race condition when onboarding new patients causes patients to be created in the main application but not in the clinical engine. We had built in a safety mechanism of 2 seconds to make sure the timing issue does not occur, but in rare circumstances this safety mechanism was not sufficient.
Solution
- Direct correction was done by resyncing the affected patients.
- Improved the safety mechanism to use a transation hook to ensure database events happen in the correct order
- Improved monitoring of the syncing between Clinical Engine and main application (Drift Detector). Our previous drift detector was in use, but generated false positives and therefore was often ignored.
- Current Drift Detector is now run every day as monitoring with results directly visible to the developers. Clarified the responsibilities of who is accountable for the drift.
- Monitoring over the last few days shows no new drift.
Communication and documentation
External
- Affected organizations contacted directly.
- inStatus updates to share status of the system.
- Post mortem was written. You are reading it.
Internal
- Regular internal communication channels were used. SLA was met.
Further Improvements
Several improvements have been carried out already. See Solution for overview.
- Add extra monitoring of specific drift errors whenever a measurement is sent in. (CAPA-1)