Outage in Treasure Data

Elevated error rate and performance degradation for personalization API

Resolved Major
January 30, 2025 - Started 11 months ago - Lasted about 5 hours
Official incident page

Incident Report

We detected degraded performance of personalization API and an error rate increase. We are currently investigating this issue.

Need to monitor Treasure Data outages?

One place to monitor all your cloud vendors. Get instant alerts when an outage is detected.

Latest Updates ( sorted recent to last )
RESOLVED 11 months ago - at 01/30/2025 03:43PM

We implemented fundamental isolation to a problematic configuration at 14:42 UTC. The remediation caused the cluster workload to drop from 60% to 1%. On Friday, we implemented write access isolation to the problematic configuration. It stopped the cluster workload from growing. Today, we implemented read access isolation that restored the cluster workload to the previous level.

The system is operating normally now. We close the incident. We acknowledge we need further actions to prevent the same incident from happening again by a similar configuration. We will post further postmortem when we are ready.

MONITORING 11 months ago - at 01/30/2025 02:18PM

We are still monitoring the service.

Between Thursday, 30 Jan 2025, 10:00 UTC to 11:05 UTC, customers experienced elevated error rates and longer latency for Profiles API lookup. Currently, the cluster workload has calmed down and is operating normally.

Our response team is ready to provision additional processing capacity. However, we are closely monitoring the service status to avoid further downtime during peak times. In addition to it, we are working on isolating problematic accesses from the service.

We will keep the status page open and update you on the progress.

MONITORING 11 months ago - at 01/30/2025 12:31PM

We are continuing to monitor for any further issues.

MONITORING 11 months ago - at 01/30/2025 11:38AM

We are currently observing that the performance degradation and error rate have improved.
We continue to closely monitor the metrics.

INVESTIGATING 11 months ago - at 01/30/2025 10:54AM

We detected degraded performance of personalization API and an error rate increase.
We are currently investigating this issue.

Status Aggregator for All Your Third-Party Services

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4670 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook