Outage in Treasure Data

Elevated error rate and performance degradation for personalization API

Resolved Major
January 30, 2025 - Started 6 days ago - Lasted about 5 hours
Official incident page

Need to monitor Treasure Data outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Treasure Data, and never miss an outage again.
Start Free Trial

Outage Details

We detected degraded performance of personalization API and an error rate increase. We are currently investigating this issue.
Latest Updates ( sorted recent to last )
RESOLVED 6 days ago - at 01/30/2025 03:43PM

We implemented fundamental isolation to a problematic configuration at 14:42 UTC. The remediation caused the cluster workload to drop from 60% to 1%. On Friday, we implemented write access isolation to the problematic configuration. It stopped the cluster workload from growing. Today, we implemented read access isolation that restored the cluster workload to the previous level.

The system is operating normally now. We close the incident. We acknowledge we need further actions to prevent the same incident from happening again by a similar configuration. We will post further postmortem when we are ready.

MONITORING 6 days ago - at 01/30/2025 02:18PM

We are still monitoring the service.

Between Thursday, 30 Jan 2025, 10:00 UTC to 11:05 UTC, customers experienced elevated error rates and longer latency for Profiles API lookup. Currently, the cluster workload has calmed down and is operating normally.

Our response team is ready to provision additional processing capacity. However, we are closely monitoring the service status to avoid further downtime during peak times. In addition to it, we are working on isolating problematic accesses from the service.

We will keep the status page open and update you on the progress.

MONITORING 6 days ago - at 01/30/2025 12:31PM

We are continuing to monitor for any further issues.

MONITORING 6 days ago - at 01/30/2025 11:38AM

We are currently observing that the performance degradation and error rate have improved.
We continue to closely monitor the metrics.

INVESTIGATING 6 days ago - at 01/30/2025 10:54AM

We detected degraded performance of personalization API and an error rate increase.
We are currently investigating this issue.

Don't be the last to know when your dependencies go down

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 3722 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook