Investigating reports of connectivity issues across accounts

Incident Report for monday.com

Postmortem

Yesterday, April 16th, at 05:10 AM EST, we started to experience platform interruptions, preventing customers on our US server from accessing the platform intermittently. There was also limited access to our API and automation services.

There was no data loss or security risk and our EMEA and APAC servers were not impacted by the downtime.

The issue was caused by a new monitoring service that we gradually added to our infrastructure, which was designed to flag large or complex queries and ran smoothly initially but unfortunately later caused a lock within the infrastructure.

As part of our investigation we turned the service off, and reverted to an older version of the service which had been running for many months without any issues.

The platform was unavailable during the following times [EST]

05:10-05:30 AM (20 minutes) 11:06-11:14 AM (8 minutes) 11:35-11:42 AM (7 minutes) 12:00-12:04 PM (4 minutes)

The platform was fully recovered at 12.04 pm (EST) yesterday, and we’re in progress on developing both short—and long-term plans that will improve recovery time, resilience against similar issues, and overall platform stability.

Our teams are working tirelessly to restore total platform stability and the trust you place in monday.com as your chosen Work OS. We apologize for any inconvenience or impact caused to your workflow and teams.

Meanwhile, our support team is available for any help you might need with your account via the help center.

Posted 11 months ago. Apr 17, 2024 - 16:52 UTC

Resolved

The platform is now back to regular service. Please refresh your browser to access the platform. Thank you for your patience.
Posted 11 months ago. Apr 16, 2024 - 17:12 UTC

Update

We are continuing to monitor results to restore full system stability. Automations and API are now operational.
Posted 11 months ago. Apr 16, 2024 - 16:50 UTC

Update

We are continuing to monitor results to restore full system stability. Automations are now operational and API may experience delays.
Posted 11 months ago. Apr 16, 2024 - 16:43 UTC

Monitoring

Another fix has been implemented and we are monitoring results. Please note API and automations will experience delays.
Posted 11 months ago. Apr 16, 2024 - 16:10 UTC

Investigating

After monitoring the fix, we are now investigating reports of connectivity issues across the platform. Our team is working to resolve this promptly.
Posted 11 months ago. Apr 16, 2024 - 16:04 UTC

Update

We are continuing to monitor results to restore full system stability. Automations and API may experience delays.
Posted 11 months ago. Apr 16, 2024 - 15:27 UTC

Monitoring

A fix has been implemented and we are monitoring results to restore full system stability. API and automations will still experience issues during this time.
Posted 11 months ago. Apr 16, 2024 - 15:22 UTC

Update

Our team is still investigating reports of connectivity issues. We are actively working on this and will keep you updated with further details as soon as they become available.
Posted 11 months ago. Apr 16, 2024 - 15:16 UTC

Update

We are currently investigating reports of connectivity issues across the platform. Our team is working to resolve this promptly.
Posted 11 months ago. Apr 16, 2024 - 15:13 UTC

Investigating

We are currently investigating reports of connectivity issues across the platform. Our team is looking into this and will provide you with a response shortly.
Posted 11 months ago. Apr 16, 2024 - 15:09 UTC
This incident affected: US (Platform).