Trusted by 1,000+ teams
Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.
Cognite has deployed a fix for the issue causing intermittent 5xx errors on the Data Modeling Service (DMS) in the az-eastus-1 cluster. All related SLO alerts have cleared and background tasks have resumed normal operation. We are continuing to observe the cluster closely before resolving the incident.
We will post a further update once monitoring confirms full stability.
Cognite has reviewed deployment history, logs, and network dependencies. An internal process was found to be blocking operations. Cognite has stopped that process. The rates on the 500 errors has dropped, and the responsible engineering teams will resume investigation tomorrow.
Engineers have identified a mitigation action and are working on a temporary solution, as well as monitoring the situation.
The engineers have identified the root cause and are working on a solution.
Problem: The datamodelstorage service in az-eastus-1 is experiencing periodic spikes to 100% 5xx error rates, resulting in major outages for users.
Impact: Multiple customers are receiving repeated 503 errors. All users on this cluster are likely affected.
Sev A declared.
Problem: The datamodelstorage service in az-eastus-1 is experiencing periodic spikes to 100% 5xx error rates, resulting in major outages for users.
Impact: Multiple customers,, are unable to use their digital applications receiving repeated 503 errors. All users on this cluster are likely affected.
Sev A declared.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 6320 services available
Integrations with