What Happened:
On March 26th, 2025, 14:48 UTC, our US customers experienced a 9 hour and 14 minute incident. Our Simple Queue Service (SQS) began accumulating messages due to architectural issues, which resulted in slower processing of emails, hence delays in emails logging within the Emails & Activities framework.
What caused it:
There were a few accounts that received a massive number of emails which took over the entire processing.
How we responded:
We have an additional queue on production to which we routed the problematic accounts, alleviating traffic from the main queue.
What we're doing to prevent recurrence:
We have implemented metrics as alerts to trigger PagerDuty notifications early and are planning to update the architecture soon.
We sincerely apologize for any disruption this caused to your workflow.
Thank you for your understanding as we work to continuously improve our platform's reliability.
Your team at monday.com