S2 - Reservations - floor plan and calendar views not loading existing data and certain workflows are failing
Incident Report for Eptura Workplace
Postmortem

Eptura Workplace detailed Root Cause Analysis | 09/10/2024 

S2 - Reservations - floor plan and calendar views not loading existing data and certain workflows are failing 

 

We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident. 

 

Description: 

On September 10th, our Engineering team identified a Severity 2 issue with the Reservations module, affecting data loading and workflows. This was tied to our September 5th release of recurring reservations. Our dedicated team rolled back the release, fixed the coding bug, and redeployed successfully. Additionally, our Operations team increased Redis memory to handle higher demands. 

 

Type of Event: 

Functionality Issue 

 

Services/Modules impacted: 

Production/ Reservation Modules

 

Timeline:

09/10/2024 (Reported MDT) 

  • 12:30 PM: After multiple reports of the inability to check in to reservations and floor plan views not displaying correctly in space availability and hummingbird application, the engineering and product team raised an S2 incident, and all customers were made aware that we are investigating the issue via status page.  
  • 1:44 PM: The engineering team has identified the root cause of the disruption is caused by the release that was deployed on September 5, 2024. All customers were notified of the news via the status page and that we are closely monitoring the situation.  
  • 11:11 PM: The rollback of the release is in progress. 

  

09/11/2024 (Reported MDT) 

  • 3:21 AM: The status page is updated from identified to monitoring and customers are made aware of the roll back that was completed. Monitoring will continue throughout the day. 
  • 4:05 AM: During our detailed testing of the application, we have noticed that floor plan and calendar views are now working fine, however we are still observing errors while checking into reservation. The team continues to investigate this issue. 
  • 11:06 AM: Communication was sent to customers about the rollback and what was resolved and what we are still working on:  Following the update on September 5, we encountered some performance issues with the Reservations feature and the hummingbird app. To ensure service continuity, we have reverted to the previous software version. This action has successfully resolved errors related to Reservation floor plans and connectivity issues with the hummingbird app. Our team is committed to fully resolving the check-in errors and restoring optimal functionality to the Reservations feature. We are making steady progress and will continue to keep you informed with the latest updates as we enhance the system. 
  • 3:22 PM: Customers were informed that the check-in errors previously identified have been successfully resolved without the need for a code release. Monitoring will continue into the next day to ensure stability. 

  

09/12/2024 (Reported MDT) 

  • 12:01 PM: Customers have been informed that check-in errors previously identified have been largely resolved. Most functionality has been restored. However, we are aware of some residual issues with the Hummingbird Application that may still be affecting a few users. 
  • 4:23 PM: Monitoring continues for some users experiencing issues accessing the Hummingbird application. Monitoring continues through to 9/16/2024. 

  

9/16/2024 (Reported MDT) 

  • 9:04 AM: The status page is updated from Monitoring back to Investigating. Customers were informed that our engineering team have diligently work through the weekend and continue to investigate the intermittent accessibility issue for the Hummingbird Application. 
  • 1:14 PM: The status page was updated from Investigating to Identified and customers were informed that the engineering team is working on a resolution for the intermittent accessibility issue to the Hummingbird application. The engineering team continues to work on a resolution through 9/18. 

  

9/18/2024 (Reported MDT) 

  • 8:20 AM: Engineering team has developed a fix and is going through QA testing. The team plans to release Hot Fix on this issue for 9/23/2024.  

  

9/19/2024 (Reported MDT) 

  • 10:34 AM: Status page message regarding the hot fix was edited from 9/23/2024 to 9/24/2024. 

  

9/25/2024: (Reported MDT) 

  • 9:02 AM: Customers were informed of the following as the hotfix was not deployed: To ensure we deliver the best experience, our product and QA teams are thoroughly testing the hotfix initially planned for September 24, 2024. They've identified an issue that requires additional attention, so we're taking extra time to make sure everything is perfect. Our teams are working diligently to resolve this and will share an update as soon as possible. We appreciate your understanding and patience. 

  

9/27/2024 (Reported MDT) 

  • 11:32 AM: The status page remains in an identified phase. Customers are informed of the hot fix that will be deployed on Thursday, 10/3/2024 at 10 PM CDT.  

  

10/04/2024 (Reported MDT) 

  • 2:30 PM: The status page is updated from identified to monitoring. Since the hotfix was deployed, customers begin to confirm the resolution and monitoring will continue.  
  • 4:36 PM: The status page has been marked as resolved, and customers were made aware of an issue that was discovered during the hotfix. The following was communicated with customers that explains the impact of this discovered issue and what can be expected and when we anticipate a resolution. 

 

Total Duration of Event: 

26 Days 4 Hours 6 Minutes 

 

Root Cause:  

An issue in Eptura Workplace led to production errors, originating from a release deployed on September 5, 2024. The investigation revealed three primary root causes: a coding error where a variable was declared both locally and globally, a QA environment that did not accurately replicate production data and systems, and a Redis instance that encountered an "out of memory" error due to insufficient capacity and lack of monitoring. These findings provide valuable insights for enhancing our processes and ensuring a more robust system moving forward. 

 

Remediation: 

To swiftly address the recent incident, our Engineering team rolled back the affected release, identified and fixed the coding bug, and successfully tested and deployed the fixed release to production. Additionally, our Operations team increased the memory allocation for Redis to handle higher capacity demands. 

 

Prevention: 

To prevent future incidents, our Engineering team will address technical debt by reviewing and refactoring similar coding issues. Our Quality Assurance team will allocate more time for comprehensive testing, enhance test cases to better simulate production environments, and collaborate with Operations to improve the QA environment. Additionally, our Operations team will implement enhanced monitoring and alerting for Redis memory capacity to proactively address potential issues. These proactive measures will ensure a more reliable experience for our users.

Posted Nov 21, 2024 - 17:16 UTC

Resolved
We are pleased to inform you that the issue with S2 - Reservations - floor plan and calendar views not loading existing data and certain workflows are failing has been resolved. Our Engineering team has completed the necessary actions and verified that the service is now functioning normally.

A Root Cause Analysis (RCA) will be conducted to understand the incident in detail and will be made available on our Status Page within 10 days.

We wanted to provide an update regarding the hotfix we recently released. During our thorough release testing, we discovered a small issue. After careful consideration, we decided to proceed with the release as planned, as the benefits significantly outweigh the impact of the issue.

Here's what you need to know about the issue’s impact:

For customers using Microsoft Exchange Web service or Google Calendar service, there is a minor issue in the Reservation Center affecting spaces not integrated with Exchange/GCal where the “split multi-day” preference is set to “true.” In this scenario, multi-day events may not split as expected. Rest assured, all other functionalities (creation, edit, delete, and notifications) will work normally as expected.
Our team has already identified the root cause and is developing a solution which will be resolved in our next release.

We appreciate your patience as we continue to enhance the system.
Posted Oct 04, 2024 - 22:35 UTC
Monitoring
We have implemented a solution for the issue affecting Reservations - floor plan and calendar views not loading existing data and certain workflows are failing and are currently monitoring the situation to ensure stability and performance. Our Engineering team is overseeing the process to confirm that the issue has been fully resolved.

Next Update 5:30 PM CST
Posted Oct 04, 2024 - 20:30 UTC
Update
We are committed to keeping you informed about our efforts to resolve the ongoing issue. Our dedicated teams have been working diligently to ensure the upcoming hotfix meets the highest standards of quality, and we're pleased to share that it has now passed our QA testing.

We are now preparing for the release of the hotfix on Thursday, October 3rd at 10 PM CDT. Our QA team will begin comprehensive regression testing starting tomorrow to ensure everything is functioning as expected.

We appreciate your ongoing understanding and support as we strive to deliver a reliable and effective solution.
Posted Sep 27, 2024 - 17:32 UTC
Identified
To ensure we deliver the best experience, our product and QA teams are thoroughly testing the hotfix initially planned for September 24, 2024. They've identified an issue that requires additional attention, so we're taking extra time to make sure everything is perfect. Our teams are working diligently to resolve this and will share an update as soon as possible. We appreciate your understanding and patience.
Posted Sep 25, 2024 - 15:02 UTC
Monitoring
A fix was implemented on the system and we are currently monitoring the situation.
Posted Sep 25, 2024 - 11:30 UTC
Update
Our internal team has developed a fix and is testing the same. Once the testing is completed, it will be rolled out to Production environment.

Next update: 9 AM Central time on Tuesday, September 24, 2024.
Posted Sep 18, 2024 - 14:20 UTC
Update
We are continuing to work on implementing a fix for this issue. Our next update will be at 9 AM Central Time.
Posted Sep 18, 2024 - 10:19 UTC
Update
We are continuing to work on implementing a fix for this issue. Our next update will be at 3AM CST
Posted Sep 18, 2024 - 03:57 UTC
Update
We are continuing to work on implementing a fix for this issue. Our next update will be at 11PM CST
Posted Sep 17, 2024 - 23:14 UTC
Update
We are continuing to work on implementing a fix for this issue. Our next update will be at 6 PM CST
Posted Sep 17, 2024 - 18:02 UTC
Update
We are continuing to work on implementing a fix for this issue. Our next update will be at 1 PM CST
Posted Sep 17, 2024 - 13:11 UTC
Update
We are continuing to work on implementing a fix for this issue. Our next update will be at 8 AM CST
Posted Sep 17, 2024 - 08:48 UTC
Update
We are continuing to work on implementing a fix for this issue.
Our next update will be at 4 AM CST
Posted Sep 17, 2024 - 05:15 UTC
Identified
The issue with the Hummingbird Application and the intermittent accessibility issue has been identified and a fix is being implemented. We will post another update at 6:15 CST or sooner.
Posted Sep 16, 2024 - 19:14 UTC
Investigating
Our engineering team have diligently work through the weekend and continue to investigate the intermittent accessibility issue for the Hummingbird Application. We will be moving back into an investigation phase and will continue to keep you updated. We appreciate your patience as we work through this.

Next Update: 1 PM MDT
Posted Sep 16, 2024 - 15:04 UTC
Update
We are continuing to monitor the accessibility issue with the Hummingbird Application for some users throughout the day. We will be updating on Monday the 16th at 9 am MST
Posted Sep 14, 2024 - 01:16 UTC
Update
We are continuing to monitor the accessibility issue with the Hummingbird Application for some users throughout the day.

Next Update: 5 PM CST
Posted Sep 13, 2024 - 18:04 UTC
Update
We are continuing to monitor the accessibility issue with the Hummingbird Application for some users throughout the morning.

Next Update: 1 PM CST
Posted Sep 13, 2024 - 13:05 UTC
Update
We will continue to monitor the accessibility issue with the Hummingbird Application for some users through out the evening. Meanwhile all reservation workflows are working as expected.

Next Update: 8 AM CST
Posted Sep 12, 2024 - 22:04 UTC
Update
We are pleased to inform you that the check-in errors previously identified have been largely resolved. Most functionality has been restored. However, we are aware of some residual issues with the Hummingbird Application that may still be affecting a few users. Our Support teams are actively monitoring the situation and working to resolve these remaining problems to ensure full stability.

Next Update: 5pm CST
Posted Sep 12, 2024 - 18:00 UTC
Update
Our support team will continue to monitor check-in errors throughout the day.

Next update: 1 PM CST
Posted Sep 12, 2024 - 13:00 UTC
Update
Our Support team is continuing to diligently monitor the check-in errors. We will maintain close observation overnight to ensure stability.

Next update: 8 AM CST
Posted Sep 12, 2024 - 00:55 UTC
Update
We are happy to report that the check-in errors previously identified have been successfully resolved without the need for a code release. Expected functionality has been restored. Our Support teams are diligently monitoring the situation to ensure continued stability.

Next update: 8 PM CST
Posted Sep 11, 2024 - 21:22 UTC
Update
We appreciate patience as we actively address the recent challenges with the Workplace Reservations feature during check-ins.

Following the update on September 5, we encountered some performance issues with the Reservations feature and the Hummingbird app. To ensure service continuity, we have reverted to the previous software version. This action has successfully resolved the errors related to Reservation floor plans and connectivity issues with the Hummingbird app.

Our team is committed to fully resolving the check-in errors and restoring optimal functionality to the Reservations feature.. We are making steady progress and will continue to keep you informed with the latest updates as we enhance the system.

Should you encounter any further difficulties, please do not hesitate to reach out to our support team for immediate assistance. We value your cooperation and are here to support you. Thank you.

Next update - 4 PM CDT
Posted Sep 11, 2024 - 17:06 UTC
Update
Observation continues for the errors. Our team is investigating the issue and we will keep you updated on the progress.

Next update - 1 PM CDT
Posted Sep 11, 2024 - 14:03 UTC
Update
During our detailed testing of the application, we have noticed that floor plan and calendar views are now working fine, however we are still observing errors during checking into reservation. Our team is investigating the issue and we will keep you updated on the progress.

Next update - 9 AM CDT
Posted Sep 11, 2024 - 10:05 UTC
Monitoring
Our team has completed the roll back of the recent release and have also completed the sanity testing. The functionality should be restored now, and we are closely monitoring the same.

Next update 5:30 AM CDT
Posted Sep 11, 2024 - 09:21 UTC
Update
The rollback is currently still in progress.
We will provide another by 4AM CDT.
Posted Sep 11, 2024 - 05:10 UTC
Update
We are continuing to investigate a rollback and fix for this issue.
Please note that this issue may also impact check-ins.

Our next update will be by 12AM CDT.
Posted Sep 11, 2024 - 01:08 UTC
Identified
Our Engineering team has identified the root cause of the disruption affecting Reservations, where floor plan and calendar views are not loading existing data and some workflows are failing.

We are actively working on a solution to restore full functionality. Part of this resolution involves rolling back the recent release deployed on September 5th.

We are closely monitoring the situation and will provide the next update by 6 PM CST.

Thank you for your continued patience as we work to resolve the issue.
Posted Sep 10, 2024 - 19:44 UTC
Update
We are currently investigating an issue with Reservations - floor plan and calendar views not loading existing data and certain workflows are failing.

Our Engineering team is actively working to determine the root cause of the disruption and assess its impact.

We will provide our next update by 6:23pm CST.

Thank you for your patience as we work to resolve this issue.
Posted Sep 10, 2024 - 19:24 UTC
Investigating
We are currently investigating an issue with Eptura Workplace. We will update you when we have more information.
Posted Sep 10, 2024 - 19:20 UTC
This incident affected: Apps (Hummingbird App) and Eptura Workplace Modules (Reservation Module).