Investigating connectivity issues across accounts and devices due to an AWS outage

Incident Report for monday.com

Postmortem

What Happened:

On October 20th, starting around 03:00 AM EST, our US server customers experienced approximately 250 minutes of disruption, including about 165 minutes of full system downtime followed by 522 minutes of degraded performance that affected automations and integrations. 

The issue was caused by an outage of our third-party provider, AWS, which primarily impacted the US region, but also created downstream effects globally. As a result, some customers in our EU and AU regions experienced related degraded performance issues with API, automations, and integrations, lasting up to 150 minutes.

What caused it:

This incident was caused by a major outage at our third-party provider, AWS (Amazon Web Services), in one of their U.S. data center regions. This outage led to widespread disruptions across many platforms, including monday.com.

This issue began with errors and delays within AWS’s core systems, which affected how our servers handled data and automations. As AWS implemented mitigation efforts to restore functionality, our team actively worked to maintain stability and recovery as services gradually returned to normal.

How we responded:

Our engineering and developers teams acted immediately to stabilize the platform once the issue was detected. To protect performance and further disruption, we temporarily paused automations and API activity connected to the affected region while we worked on restoring service.

We kept customers informed through regular updates on our Status Page and made configuration changes to reduce the impact on other parts of the platform. As AWS began to recover, we restored platform functionality in stages to ensure everything remained stable. 

Full functionality, including automations, integrations, and API was restored once reliability was confirmed in all regions. 

We sincerely apologize for any disruption this caused to your workflow.

Thank you for your patience and understanding as we continue strengthening our systems to ensure greater resilience and reliability. 

Your team at monday.com

Posted Oct 21, 2025 - 07:30 UTC

Resolved

The platform is now back to regular service. Please refresh your browser to access the platform. Thank you for your patience
Posted Oct 20, 2025 - 21:18 UTC

Update

We are continuing to monitor for any further issues.
Posted Oct 20, 2025 - 21:03 UTC

Update

All platform components availability and latency have returned to normal. We are no longer observing issues coming from AWS infrastructure.
Our teams will continue to monitor platform availability; Automations is back to work but we do expect delays in automation completion.
Posted Oct 20, 2025 - 20:23 UTC

Update

All system is working and being monitored beside automations and API in US which are still being affected.
Posted Oct 20, 2025 - 18:17 UTC

Update

A fix has been implemented, all system is working and being monitored beside automations in US which are still being affected.
Posted Oct 20, 2025 - 17:04 UTC

Update

We are monitoring results to restore full system stability on the platform, including automations on the US server and issues connecting customers with our Customer Support Team. We will continue to provide updates until the issue is fully resolved!
Posted Oct 20, 2025 - 14:59 UTC

Update

Automations are experiencing degraded performance on the US server. We are still experiencing issues connecting customers with our Customer Support Team. We will continue to provide updates until the issue is fully resolved!
Posted Oct 20, 2025 - 14:45 UTC

Update

Automations are experiencing degraded performance on the US server. We will continue to provide updates until the issue is fully resolved!
Posted Oct 20, 2025 - 14:43 UTC

Update

We are currently experiencing issues connecting customers with our Customer Support Team. We are continuing to monitor for any platform-related issues.
Posted Oct 20, 2025 - 13:52 UTC

Update

We are still monitoring the incident.
Posted Oct 20, 2025 - 13:00 UTC

Update

Automations are experiencing degraded performance on the US server. We're working to fully resolve the issue as soon as possible!
Posted Oct 20, 2025 - 11:19 UTC

Update

Automations and integrations are experiencing degraded performance on the US server. We will continue to provide updates until the issue is fully resolved!
Posted Oct 20, 2025 - 11:12 UTC

Update

We're still working to fully resolve the issue as soon as possible!
Posted Oct 20, 2025 - 10:47 UTC

Monitoring

The EU and AU servers are fully functional, and the platform is partly functional in the US server. We're working to fully resolve the issue as soon as possible
Posted Oct 20, 2025 - 10:18 UTC

Update

The platform is starting to recover, we will update again soon.
Posted Oct 20, 2025 - 10:05 UTC

Update

Automations and integrations working again in the AU server, but may experience delays
Posted Oct 20, 2025 - 08:55 UTC

Update

Automations and integrations working again in the EU server, but may experience delays
Posted Oct 20, 2025 - 08:29 UTC

Update

We are continuing to work on a fix for this issue.
Posted Oct 20, 2025 - 08:05 UTC

Update

Our automations and integrations are also experiencing issues in the EU and AU servers. We are working to fix the issue.
Posted Oct 20, 2025 - 08:00 UTC

Identified

Our team has identified that the root cause of the issue is with our third party, and they are working to resume regular service usage. Additionally, API is partially working in the EU and AU servers.
We will continue to provide updates on their progress
Posted Oct 20, 2025 - 07:52 UTC

Update

Our API is also experiencing connectivity issues in our EU and AU servers. We're working to investigate the issue.
Posted Oct 20, 2025 - 07:30 UTC

Update

We are continuing to investigate this issue.
Posted Oct 20, 2025 - 07:17 UTC

Investigating

We are currently investigating reports of connectivity issues across the platform, which are caused by an AWS outage per https://health.aws.amazon.com/health/status. Our team is working to resolve this promptly.
Posted Oct 20, 2025 - 07:12 UTC
This incident affected: US (Platform) and EU (Platform).