Outage in LiftIgniter

Datastore issues in US West region causing degraded recommendation quality

Resolved Minor
November 04, 2020 - Started almost 5 years ago
Official incident page

Incident Report

This is similar to http://status.liftigniter.com/incidents/1z9fqwpckkyk We experienced a hardware issue affecting multiple nodes of out datastore in US West [EDIT: Our hosting provider, Google Cloud, believes that this was actually a software issue with their virtualization software, and not a real hardware issue]. Capacity has been restored (it was limited between 8:30 UTC and 9:41 UTC). During the period of limited capacity, we experienced increased latency and substantially degraded recommendation quality. Now that capacity has been restored, the ongoing challenge is that due to the large amount of node failures, the datastore system in US West lacks all the data that it should have, causing degraded recommendation quality in some cases. We are working to restore from backups.

Need to monitor LiftIgniter outages?

One place to monitor all your cloud vendors. Get instant alerts when an outage is detected.

Try IsDown risk-free 14-day free trial · No credit card required
Latest Updates ( sorted recent to last )
RESOLVED almost 5 years ago - at 11/05/2020 01:50PM

Data restoration is now complete and all metrics are back to normal. We will continue to keep an eye for any data inconsistencies created by the restore process, but nothing seems off as of now.

IDENTIFIED almost 5 years ago - at 11/04/2020 09:55AM

This is similar to http://status.liftigniter.com/incidents/1z9fqwpckkyk

We experienced a hardware issue affecting multiple nodes of out datastore in US West [EDIT: Our hosting provider, Google Cloud, believes that this was actually a software issue with their virtualization software, and not a real hardware issue]. Capacity has been restored (it was limited between 8:30 UTC and 9:41 UTC). During the period of limited capacity, we experienced increased latency and substantially degraded recommendation quality.

Now that capacity has been restored, the ongoing challenge is that due to the large amount of node failures, the datastore system in US West lacks all the data that it should have, causing degraded recommendation quality in some cases. We are working to restore from backups.

The Status Page Aggregator Built for IT Teams

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4522 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook