iOFFICE Detailed Root Cause Analysis – Severity 2 – June 8, 2023
Space - Unable to Load Large Floor Plans
Description:
On June 8, 2023, at approximately 11:46am MDT, the Support team received reports of Space - Not able to upload large floor plans through SoftSpace or Admin Space.
Type of Event:
Service Disruption
Services\Modules Impacted:
Production – SWF Drawing Service
Timeline:
Thursday June 8, 2023
On June 8, 2023, at approximately 11:46am MDT, the Support team received reports of Space - Not able to upload large floor plans through SoftSpace or Admin Space. The issue was reported to engineering and they acknowledged that they are investigating the issue. All customers were alerted about the disruption at 12:13pm MDT through our Status Page that we are in an Investigation Phase on our Status Page. 7:28pm MDT Support is notified by our dev ops team for testing on a possible resolution.
Friday June 9, 2023
Our support team begins to test the possibility of the fix at 6:15am MDT and the Status Page is updated to Monitoring Phase. At 7:13am MDT support confirms the disruption is ongoing. The engineering team continues to investigate, and customers are notified that we are moving back into an Investigation Phase which continues till approximately 3:15pm MDT. Our engineering team has found a temporary fix by upgrading internal systems, so that customers will be able to upload plans through the weekend. Meanwhile, the engineering team works on a permanent fix through the weekend. They will also monitor the temporary fix for any issues. Customers were alerted that monitoring will continue through the weekend.
Monday June 12, 2023
Investigation and Monitoring continues.
Tuesday June 13, 2023
At 8:09am MDT the Engineering team confirms that they have identified a more permanent solution for the issue and a hotfix should be released as soon as it passed QA. At 10:51pm MDT Dev Ops confirms that the hot fix has been released to production.
Wednesday June 13, 2023
At 8:55am MDT the Support and Drafting Teams have confirmed the fix and at 9:16am MDT Support confirms that as of June 9th, no additional reports were made to the support team about this issue. All customers have confirmed that this issue has been resolved and the Status Page is moved into a Resolved Phase.
Total Duration of Event:
5 days, 21 hours, 30 mins
Remediation:
We upgraded to the latest version of Java SWF drawing service. We also increased the memory and CPU power for the cluster.