Eptura Workplace Detailed Root Cause Analysis | Severity 1 | August 23, 2023
Inability to Access Eptura Workplace
We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident.
Description:
On August 23, 2023, at approximately 11:19am EDT, customer support received multiple reports of users who cannot access their Eptura Workplace platform. When attempting to login, they are presented with an error.
Type of Event:
Outage
Services\Modules Impacted:
All of Production
Timeline:
On August 23, 2023, at approximately 11:19am EDT, customer support received an initial report of users who cannot access their Eptura Workplace platform. When attempting to login, they are presented with an error. At 11:30 EDT additional customers reported the issue. All customers were notified via the status page that we are currently investigating a Severity 1 issue with users and inability to access Eptura Workplace. Engineering begins to investigate and notifies support at 12:26pm EDT that they have identified the issue's root cause. The status page is then updated from investigating to identified. After thorough testing in QA environments, engineering confirms that the fix is successful in our staging environments and will soon begin to deploy the fix to production. At 12:48pm EDT, the engineering team confirms that the fix has been successfully deployed to production. The status page is updated from identified to monitoring. As support monitors, customers begin to confirm that access to their Eptura Workplace instance is a success. No additional reports were made to support and at 3:28pm EDT, the status page was updated from monitoring to resolved.
Total Duration of Event:
1 hour 20 minutes
Root Cause:
On 23 August 2023 at 08:06 Mountain time a configuration change overriding the default redis timeout was added, and the ioffice-connect application automatically restarted in response to that. While the value given was correct in the configuration, the value was presented as a string instead of an integer. This type of mismatch caused the issue.
Remediation:
We have removed the configuration for the time being, reverting back to the default.
Preventative Action:
We have a change coming that better convert these.