Investigating connectivity issues in US server
Incident Report for monday.com
Postmortem

To our monday community, we’d like to share an update about a brief service interruption that occurred on our US servers on Monday, July 1st, 2024.

What Happened:

  • At 2:04 AM EST, a bug from a third-party provider caused a configuration error during a database update. This led to a service interruption on our US servers for approximately 15 minutes.
  • Some customers may have experienced slowness on the platform leading up to the interruption.
  • There was no data loss, security risk, or impact on our EMEA and APAC servers.

Our Response:

  • Our monitoring system immediately alerted our R&D teams.
  • We promptly opened a support channel with the third-party provider while initiating internal troubleshooting.
  • By 2:19 AM EST, service was fully restored, and the platform was functioning as expected.
  • Our teams continued to actively monitor performance as a precaution.

What We're Doing to Prevent Similar Issues:

  • Our team is working closely with the third-party provider to investigate the root cause of the bug and improve our review processes.
  • To mitigate future issues, we are developing a dedicated backup instance for these third-party components. 

We sincerely apologize for any inconvenience this interruption caused. 

Our top priority is to ensure a stable and exceptional user experience on our platform.

Additional Resources:

Posted Jul 02, 2024 - 15:10 UTC

Resolved
The issue has been successfully resolved. Please refresh your browser to access the platform. Thank you for your patience!
Posted Jul 01, 2024 - 06:40 UTC
Monitoring
A fix has been implemented and we are monitoring results to ensure full stability with automations.
Posted Jul 01, 2024 - 06:26 UTC
Identified
Our team has identified the root cause of the issue affecting automations and is working to resume regular service usage. We will continue to provide updates on their progress
Posted Jul 01, 2024 - 06:12 UTC
Monitoring
A fix has been implemented and we are monitoring results to ensure full system stability.
Posted Jul 01, 2024 - 06:11 UTC
Update
We are currently investigating reports of connectivity issues across the platform. Our team is working to resolve this promptly.
Posted Jul 01, 2024 - 06:05 UTC
Update
We are currently experiencing issues related to connectivity in US server. Our dedicated team is working to resolve this as quickly as possible
Posted Jul 01, 2024 - 05:54 UTC
Update
We are continuing to investigate this issue.
Posted Jul 01, 2024 - 05:47 UTC
Investigating
We are currently experiencing issues related to slowness platform US server. Our dedicated team is working to resolve this as quickly as possible
Posted Jul 01, 2024 - 05:30 UTC
This incident affected: US (Automations) and EU (Automations).