Post-Mortem: Authentication Service Disruption
Resolved
Minor
February 02, 2026 - Started 2 months ago
- Lasted 6 days
Incident Report
Date: January 29, 2026 Duration: 53 minutes (18:15 – 19:08 CET)
Executive Summary
On Thursday, Jan 29, Trengo experienced a service disruption that prevented users from logging into the platform. The issue was traced to a failure in our internal authentication token renewal process. A fix was deployed at 18:45, and full service was restored by 19:08.
What Happened?
The disruption began at 18:15 when our system-level bearer token, used to communicate with our authentication provider (Stytch), expired. While our automated cron job had successfully requested a new token, a caching error caused the system to store the new token in an incorrect location within our backend cache.
As a result, even though the authentication provider was operational, Trengo’s backend continued attempting to use the expired token, leading to failed login attempts for all users.
Timeline of Events (CET)
18:15: Initial reports of user logout and login failures.
18:21: Triage initiated. Verified that external providers and recent deployments were stable.
18:35: Root cause identified: The 60-day system bearer token was refreshed but misdirected in the cache.
18:45: Fix deployed to refresh the token using a more robust pathing logic.
19:08: System-wide recovery confirmed; all users able to log in.
Corrective Actions
To prevent a recurrence, we are implementing the following:
Cache Validation: Added automated verification to ensure refreshed tokens are stored and retrievable from the correct keys.
Trusted by 1,000+ teams
Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.