On July 9th, we experienced an unexpected service disruption in our US region, which led to temporary downtime of the platform and subsequent performance degradation in key features, including Apps, API access, and automations.
A failure with a critical third-party service provider introduced instability across one of our tier 1 services, leading to widespread outage in key features.
Upon identifying the issue, our engineering teams acted swiftly to investigate the root cause and mitigate the disruption. We immediately replaced the affected instance with a healthy one and scaled up our systems to ensure the platform's stability. We were able to restore full functionality gradually, and within 50 minutes, the platform was fully operational.
We continuously monitored the system during the recovery process to ensure that there were no further interruptions. We also deployed additional resources to handle any spikes in demand during the restoration period.
We’re implementing additional safeguards to detect anomalies earlier, improve system scalability, strengthen internal processes to prevent similar incidents in the future, and reduce dependency in the specific instance.
Full Downtime - 13:38-14:28 (~50m)
Apps - 14:28–15:00 (~32m)
Automations delay - 14:28–15:10 (~42m)
While the disruption lasted for an extended period, no data loss occurred during the incident, and we were able to ensure the integrity of your information.
We sincerely apologize for the inconvenience and appreciate your continued trust. Please reach out if you have any questions.
The monday.com team