Mercatus experienced intermittent update failures affecting a subset of data integration workflows in both Production and UAT environments due to lookup-based persistence mechanisms reaching capacity thresholds as data volumes scaled over time. The issue impacted specific workflows while others continued operating normally, with affected UAT data ingestion paused to prevent inconsistent updates. The incident was resolved after 17.4 hours by migrating the affected persistence systems to a higher-capacity data storage approach better suited for sustained data growth.
Trusted by 1,000+ teams
Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.
The previously reported intermittent update failures have been resolved.
The affected lookup‑based persistence has been migrated to a higher‑capacity data storage approach in both Production and UAT. This includes the reference cache and related staging structures used across ingestion workflows.
Given the breadth of this change across ingestion processes, we are closely monitoring runs to ensure continued stability and data consistency.
Please contact support if you observe any unexpected behavior.
Root Cause: As data volumes scaled over time, certain lookup‑based persistence mechanisms reached practical capacity thresholds, leading to intermittent update failures for a subset of workflows. This behavior surfaced gradually and only under specific data growth patterns, which is why many jobs continue to run successfully while others are impacted.
Impact: A defined set of workflows in UAT (with similar exposure identified in Production) are affected. Data ingestion for the impacted workflows is currently paused to prevent inconsistent updates
Current Actions: We are actively working with our third‑party service provider to validate system behavior and confirm whether any recent platform changes or capacity characteristics could be contributing to this pattern. In parallel, we are migrating affected workflows to a higher‑capacity persistence design better suited for sustained data growth and long‑term scalability
Given the number of impacted workflows, this migration will be executed in a phased manner to ensure stability and controlled rollout
Status: Remediation is in progress. We will share a more detailed timeline once the alternative persistence approach has been fully validated.
We are currently observing intermittent update failures in both Production and UAT affecting a subset of data updates backed by a lookup/persistence mechanism.
There have been no recent configuration or deployment changes from our side, and the same workflows have been operating successfully for an extended period. At the same time, many update operations continue to complete as expected, indicating a potential underlying platform‑level constraint or behavioral limitation.
We have reached out to our third‑party service provider to confirm whether any recent changes or limit adjustments on their side could be contributing to this behavior. In parallel, we are investigating:
- Whether there is any ongoing service degradation or instability
- Whether recent changes to lookup or persistence limits could be contributing to this behavior
As a precautionary measure, we are also evaluating a move to a more scalable third‑party persistence solution to ensure long‑term reliability and consistency of updates.
Further updates will be shared as more information becomes available.
Thanks,
CRPM Support Team
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 6320 services available
Integrations with