We are investigating an issue with Eptura Workplace
Incident Report for Eptura Workplace
Postmortem

Eptura Workplace detailed Root Cause Analysis | July 10, 2024

S1 – New Calendar Service using Exchange Unavailable

We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident.

 

Description:

On the afternoon of July 10, 2024, our internal team at Eptura Workplace, discovered that the new calendar service using Exchange had been inadvertently removed from our production environment. This issue only affected customers using the new calendar service.

 

Type of Event:

Eptura Workplace had experienced a Severity 1 outage regarding the new calendar service that uses Exchange.

 

Services/Modules impacted:

This disruption impacted our Production environment for the new calendar service

 

Timeline: (Reported in CST)

3:34pm: Our Support and Engineering team discovered the disruption and alerted the Eptura-Workplace-Fire-Alarm channel making internal teams aware that the new calendar services are unavailable.

3:38pm: The Cloud Operations team and their leadership were notified.

3:46pm: PagerDuty was triggered to alert other internal team members. All customers were made aware of this incident through the Status Page, and have begun investigating the issue.  

5:01pm: The Cloud Operations team confirmed the new calendar service release from the week prior was reinstated and the service is up and running, and for engineering to confirm.

5:06pm: Engineering confirms the new calendar service that uses Exchange is up and running again.

5:11pm: The Status Page was updated from Investigation to a Monitoring phase.

5:08pm: Product requests for engineering to prepare for re: data recovery and restoration of missed events for customers.

5:40pm: Product confirms that engineering has manually run the event sync looper for the list of impacted customer sites. Next steps would be for them to monitor and report any missing activity. Support will reach out to customers.

6:05pm: The Status Page was marked as resolved.

 

Total Duration of Event:

2 Hours 31 Minutes

 

Root Cause:

A manual error occurred when a pull request was inadvertently merged into the production environment by a member of our Cloud Operations team. This action resulted in the removal of a new calendar service that uses Exchange from production. Due to insufficient monitoring, no alert was raised, delaying our awareness and response to the issue.

 

Remediation:

The new calendar service was restored promptly by rolling back the pull request. We understand the criticality of this service and took immediate action to rectify the situation.

 

Preventative Action:

To prevent this incident from occurring in the future, we have implemented and successfully tested uptime alerts for the new calendar service. This will ensure any disruptions are promptly detected and addressed, minimizing impact on our customers.

Posted Jul 23, 2024 - 17:03 UTC

Resolved
This incident has been resolved.
Posted Jul 10, 2024 - 23:05 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jul 10, 2024 - 22:10 UTC
Investigating
We are currently investigating an issue with Eptura Workplace. We will update you when we have more information.
Posted Jul 10, 2024 - 20:47 UTC