Investigating reports of connectivity issues across accounts
Incident Report for monday.com
Postmortem

Yesterday, April 16th, at 05:10 AM EST, we started to experience platform interruptions, preventing customers on our US server from accessing the platform intermittently. There was also limited access to our API and automation services.

There was no data loss or security risk and our EMEA and APAC servers were not impacted by the downtime.

The issue was caused by a new monitoring service that we gradually added to our infrastructure, which was designed to flag large or complex queries and ran smoothly initially but unfortunately later caused a lock within the infrastructure.

As part of our investigation we turned the service off, and reverted to an older version of the service which had been running for many months without any issues.

The platform was unavailable during the following times [EST]

05:10-05:30 AM (20 minutes) 11:06-11:14 AM (8 minutes) 11:35-11:42 AM (7 minutes) 12:00-12:04 PM (4 minutes)

The platform was fully recovered at 12.04 pm (EST) yesterday, and we’re in progress on developing both short—and long-term plans that will improve recovery time, resilience against similar issues, and overall platform stability.

Our teams are working tirelessly to restore total platform stability and the trust you place in monday.com as your chosen Work OS. We apologize for any inconvenience or impact caused to your workflow and teams.

Meanwhile, our support team is available for any help you might need with your account via the help center.

Posted Apr 17, 2024 - 16:52 UTC

Resolved
The platform is now back to regular service. Please refresh your browser to access the platform. Thank you for your patience.
Posted Apr 16, 2024 - 17:12 UTC
Update
We are continuing to monitor results to restore full system stability. Automations and API are now operational.
Posted Apr 16, 2024 - 16:50 UTC
Update
We are continuing to monitor results to restore full system stability. Automations are now operational and API may experience delays.
Posted Apr 16, 2024 - 16:43 UTC
Monitoring
Another fix has been implemented and we are monitoring results. Please note API and automations will experience delays.
Posted Apr 16, 2024 - 16:10 UTC
Investigating
After monitoring the fix, we are now investigating reports of connectivity issues across the platform. Our team is working to resolve this promptly.
Posted Apr 16, 2024 - 16:04 UTC
Update
We are continuing to monitor results to restore full system stability. Automations and API may experience delays.
Posted Apr 16, 2024 - 15:27 UTC
Monitoring
A fix has been implemented and we are monitoring results to restore full system stability. API and automations will still experience issues during this time.
Posted Apr 16, 2024 - 15:22 UTC
Update
Our team is still investigating reports of connectivity issues. We are actively working on this and will keep you updated with further details as soon as they become available.
Posted Apr 16, 2024 - 15:16 UTC
Update
We are currently investigating reports of connectivity issues across the platform. Our team is working to resolve this promptly.
Posted Apr 16, 2024 - 15:13 UTC
Investigating
We are currently investigating reports of connectivity issues across the platform. Our team is looking into this and will provide you with a response shortly.
Posted Apr 16, 2024 - 15:09 UTC
This incident affected: US (Platform).