Yesterday, April 16th, at 05:10 AM EST, we started to experience platform interruptions, preventing customers on our US server from accessing the platform intermittently. There was also limited access to our API and automation services.
There was no data loss or security risk and our EMEA and APAC servers were not impacted by the downtime.
The issue was caused by a new monitoring service that we gradually added to our infrastructure, which was designed to flag large or complex queries and ran smoothly initially but unfortunately later caused a lock within the infrastructure.
As part of our investigation we turned the service off, and reverted to an older version of the service which had been running for many months without any issues.
The platform was unavailable during the following times [EST]
05:10-05:30 AM (20 minutes) 11:06-11:14 AM (8 minutes) 11:35-11:42 AM (7 minutes) 12:00-12:04 PM (4 minutes)
The platform was fully recovered at 12.04 pm (EST) yesterday, and we’re in progress on developing both short—and long-term plans that will improve recovery time, resilience against similar issues, and overall platform stability.
Our teams are working tirelessly to restore total platform stability and the trust you place in monday.com as your chosen Work OS. We apologize for any inconvenience or impact caused to your workflow and teams.
Meanwhile, our support team is available for any help you might need with your account via the help center.