Eptura Workplace detailed Root Cause Analysis | 05/08/2024
S2 - Eptura Workplace SSO Timeouts
We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident.
Description:
When accessing Eptura Workplace, both internal and external users were receiving the following error 502 A timeout occurred on federation.api.iofficeconnect.com.
Type of Event:
Outage for Customers who use Single Sign On (SSO)
Services/Modules impacted:
Production/ SSO
Timeline (Reported in MST):
11:30am – Multiple customers reported the inability to access Eptura Workplace.
11:59pm – After initial investigation, the support team escalates a ticket for CloudOps for further troubleshooting. All customers were made aware of the S2 incident via Status Page.
12:16pm – The CloudOps team identifies the issue and begins working on a resolution.
12:59pm – The fix was released to production. The Status Page was updated from Investigating to Monitoring.
1:59pm – While monitoring, no additional reports were received, and customers began to confirm the fix.
Total Duration of Event:
2hrs 29mins
Root Cause:
The Mesos Marathon experienced a temporary downtime during the recent restart of the NGINX server.
Remediation:
Mesos Marathon was restored and promptly restarted the NGINX server, once the recovery was complete.
Preventative Action:
The proxy settings previously pointing to the Mesos Marathon service, which has been decommissioned, have been successfully updated. This change has already been implemented to ensure continued service efficiency.