Eptura Workplace Detailed Root Cause Analysis – Severity 2 – January 22, 2024
Floor Plan Tiles Not Loading
We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident.
Description:
On Friday, January 19, 2024, at approximately 9:00am EST a report came into support advising that floor plan tiles are not loading. After an initial investigation from the support team, an internal ticket was created to further investigate the incident. All customers were made aware of the Severity 2 through the Eptura Workplace Status Page.
Type of Event:
Performance Degradation in the Space Module
Services/Modules Impacted:
Production/ Space Module
Timeline:
On Friday, January 19, 2024, at approximately 9:00am EST an initial report came into support advising that floor plan tiles are not loading. After an initial investigation from the support team, an internal ticket was created to further investigate the incident. All customers were made aware of the Severity 2 through the Eptura Workplace Status Page at 4:16pm EST. Investigation continued through the evening and our Engineering team found the resolution and the fix was implemented on Saturday morning January 20th around 8:40am EST. Monitoring continued for the rest of the day and on Sunday. All customers were notified that the incident has moved into a resolved phase on Monday, January 22, 2024, at 8:35am EST.
Total Duration of Event:
16hrs 24mins
Remediation:
The team reverted recent changes that updated a third-party library and re-populated an image cache.
Root Cause Analysis:
Our image processing service started crashing when under heavy load after updating a third-party library. These crashes corrupted or removed some images stored in our local image cache.
Preventative Action:
Made changes to the code to make it more resilient to missing images by regenerating images on-demand. Adding tests to verify the tiles are still loading even if cache doesn’t exist.