Outage in TSG Global

Inbound SMS to webhook customers stopped

Resolved Major
June 16, 2023 - Started about 2 years ago
Official incident page

Need to monitor TSG Global outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including TSG Global, and never miss an outage again.
Start Free Trial

Outage Details

Overview Inbound SMS traffic towards webhook customers partially stopped due to the database read replica lag. What happened Due to an increase in our database read replica lag, inbound traffic towards webhook customers stopped. Our SMS application was unable to fetch messages from the database since those messages were not yet available in the read replica due to the lag spike. Resolution As soon as the issue was identified, the quickest resolution was to deploy a hotfix to reconfigure all applications to read from the writer replica as temporary solution. Later, a hotfix was implemented to read from writer replica as fallback, in case the record is not found in the reader instance, if the lag ever increases again. Root Causes The root cause was due to the increase in database read replica lag. Applications were processing messages faster than records were propagated to the read replica. Applications tried to fetch messages and since those were not available they went into the retry queue so were delivered with a long delay. Impact Some HTTP webhook inbound traffic was delayed in the evening/early AM hours PST between 6/15/23 and 6/16/23. What did we learn? Since the outage was only partial, our existing metrics/alarms did not catch the issue and escalate it appropriately. We have added additional metrics and new alarms to alert for this kind of issue to prevent it from occurring again. We will also be performing some database maintenance in the near future to address the root cause.
Latest Updates ( sorted recent to last )
RESOLVED about 2 years ago - at 06/16/2023 07:00PM

Overview
Inbound SMS traffic towards webhook customers partially stopped due to the database read replica lag.

What happened
Due to an increase in our database read replica lag, inbound traffic towards webhook customers stopped. Our SMS application was unable to fetch messages from the database since those messages were not yet available in the read replica due to the lag spike.

Resolution
As soon as the issue was identified, the quickest resolution was to deploy a hotfix to reconfigure all applications to read from the writer replica as temporary solution. Later, a hotfix was implemented to read from writer replica as fallback, in case the record is not found in the reader instance, if the lag ever increases again.

Root Causes
The root cause was due to the increase in database read replica lag. Applications were processing messages faster than records were propagated to the read replica. Applications tried to fetch messages and since those were not available they went into the retry queue so were delivered with a long delay.

Impact
Some HTTP webhook inbound traffic was delayed in the evening/early AM hours PST between 6/15/23 and 6/16/23.

What did we learn?
Since the outage was only partial, our existing metrics/alarms did not catch the issue and escalate it appropriately. We have added additional metrics and new alarms to alert for this kind of issue to prevent it from occurring again. We will also be performing some database maintenance in the near future to address the root cause.

Latest TSG Global outages

Delayed MMS Delivery - 9 months ago
Toll-Free SMS/MMS/DLR Outage - almost 2 years ago

Be the First to Know When Vendors Go Down

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4400 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook