Outage in LiftIgniter

Datastore issues in US West region causing degraded recommendation quality

Resolved Minor

November 04, 2020 - Started over 5 years ago
Official incident page

Incident Report

This is similar to http://status.liftigniter.com/incidents/1z9fqwpckkyk We experienced a hardware issue affecting multiple nodes of out datastore in US West [EDIT: Our hosting provider, Google Cloud, believes that this was actually a software issue with their virtualization software, and not a real hardware issue]. Capacity has been restored (it was limited between 8:30 UTC and 9:41 UTC). During the period of limited capacity, we experienced increased latency and substantially degraded recommendation quality. Now that capacity has been restored, the ongoing challenge is that due to the large amount of node failures, the datastore system in US West lacks all the data that it should have, causing degraded recommendation quality in some cases. We are working to restore from backups.

Trusted by 1,000+ teams

The Status Page Aggregator with Early Outage Detection

Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.

Start Free Trial Learn More

Latest Updates ( sorted recent to last )

RESOLVED over 5 years ago - at 11/05/2020 01:50PM

Data restoration is now complete and all metrics are back to normal. We will continue to keep an eye for any data inconsistencies created by the restore process, but nothing seems off as of now.

IDENTIFIED over 5 years ago - at 11/04/2020 09:55AM

This is similar to http://status.liftigniter.com/incidents/1z9fqwpckkyk

We experienced a hardware issue affecting multiple nodes of out datastore in US West [EDIT: Our hosting provider, Google Cloud, believes that this was actually a software issue with their virtualization software, and not a real hardware issue]. Capacity has been restored (it was limited between 8:30 UTC and 9:41 UTC). During the period of limited capacity, we experienced increased latency and substantially degraded recommendation quality.

Now that capacity has been restored, the ongoing challenge is that due to the large amount of node failures, the datastore system in US West lacks all the data that it should have, causing degraded recommendation quality in some cases. We are working to restore from backups.

Latest LiftIgniter outages

Issues with inventory API and related background processing - about 2 years ago

Issues with email service (and another component that is not end-user-facing) likely due to Google Cloud infrastructure issues - about 2 years ago

Capacity issues in US East region due to Google Cloud infrastructure issues (they are unable to provision new instances) - about 3 years ago

Issues with console and the API usage - over 3 years ago

Issues with services in US East due to capacity issues with cloud provider - about 4 years ago

The Status Page Aggregator with Early Outage Detection

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 6320 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook