Investigating connectivity issues across accounts and devices
Incident Report for monday.com
Postmortem

What Happened:

On July 29th (09:04 AM EST), our US customers experienced a 23-minute incident, including approximately 16 minutes of difficulty loading boards and a subsequent 7-minute system downtime due to an application-level bug.

What caused it:

A recent code change intended to optimize our infrastructure's speed and scale inadvertently introduced a bug affecting data retrieval.

How we responded:

Our monitoring system promptly identified the issue, triggering an immediate response from our engineering team. By activating failover mechanisms and reverting the problematic code change, we restored normal service by 09:27 AM EST.

What we're doing to prevent recurrence:

To enhance system resilience, we are implementing additional layers of automated testing and protection, including strengthening our error detection and recovery processes.

We sincerely apologize for any disruption this caused to your workflow.

Thank you for your understanding as we work to continuously improve our platform's reliability.

Your team at monday.com

Posted Aug 01, 2024 - 06:37 UTC

Resolved
The platform is now back to regular service. Please refresh your browser to access the platform. Thank you for your patience
Posted Jul 29, 2024 - 14:03 UTC
Update
The platform is operational, and the API is now working; automations are still slow, and we're working to resolve it ASAP.
Posted Jul 29, 2024 - 13:57 UTC
Monitoring
A fix has been implemented and we are monitoring results to restore full system stability
Posted Jul 29, 2024 - 13:34 UTC
Investigating
We are currently investigating reports of connectivity issues across the platform. Our team is working to resolve this promptly.
Posted Jul 29, 2024 - 13:14 UTC
This incident affected: US (Platform, Automations, Integrations, API).