Use Cases
Software Products MSPs Schools Development & Marketing DevOps Agencies Help Desk
 
Internet Status Blog Pricing Log In Try IsDown for free now

Outage in Clockwork

Background job slowdown

Resolved Major
March 22, 2023 - Started almost 3 years ago - Lasted 1 day
Official incident page

Incident Report

Incident Details: Background job slowdown, hence delay in notification, indexing, reports and exports processing. Root Cause Analysis: There was a data export request by a firm to export all their people data (around 200K). For every such request we create a background job, with the list of people ids to be exported. Furthermore, due to the large size of the job (200k * 40 =~ 8MB), this also caused a slow down in the fetching of other records from the background jobs table (Database sort buffer getting filled up). This resulted in other job workers, i.e., ones for indexing, notifications etc. to slow down as well. Immediate Resolution: - Existing problematic export jobs were stashed, extra workers spawned to clean up the backlog. - A limit of 50K records was added for exporting of data. - Excel export and csv job separated into two so that things can be simplified. - Checks added to prevent creation of unnecessary jobs - Index added on background jobs table for faster access within a queue, to limit the impact of such huge jobs on other background processing.

Need to monitor Clockwork outages?

  • Monitor all your external dependencies in one place
  • Get instant alerts when outages are detected
  • Be the first to know if service is down
  • Show real-time status on private or public status page
  • Keep your team informed
Latest Updates ( sorted recent to last )
RESOLVED almost 3 years ago - at 03/23/2023 08:54PM

Incident Details:
Background job slowdown, hence delay in notification, indexing, reports and exports processing.

Root Cause Analysis:
There was a data export request by a firm to export all their people data (around 200K). For every such request we create a background job, with the list of people ids to be exported.

Furthermore, due to the large size of the job (200k * 40 =~ 8MB), this also caused a slow down in the fetching of other records from the background jobs table (Database sort buffer getting filled up). This resulted in other job workers, i.e., ones for indexing, notifications etc. to slow down as well.

Immediate Resolution:
- Existing problematic export jobs were stashed, extra workers spawned to clean up the backlog.
- A limit of 50K records was added for exporting of data.
- Excel export and csv job separated into two so that things can be simplified.
- Checks added to prevent creation of unnecessary jobs
- Index added on background jobs table for faster access within a queue, to limit the impact of such huge jobs on other background processing.

Latest Clockwork outages

Maintenance Patching - almost 3 years ago
Scheduled Upgrade of Servers - over 3 years ago

The Status Page Aggregator with Early Outage Detection

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 5850 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook