Need to monitor Cognite outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Cognite, and never miss an outage again.
Start Free Trial
This incident is resolved and the time series and sequences services are now operating at normal performance levels and with the resiliency that the services were designed for.
The engineering team is still not able to lift the rate limits that has been configured to protect the backend storage during the recovery and optimization work that is ongoing. Users will see 429 response codes from the API when rate limits kick in. Users will see higher rates of 429s between 4.00 am and 10:00 am UTC due to work ongoing to improve redundancy.
The engineering team is still working on stabilizing the performance of the timeseries and sequences service. There has been adjustments to the rate limiting in place to reduce the load on the system. End users will get a 429 response code to their requests if their request rate exceeds the rate limits. We are considering further relaxing rate limits, and a new update will be made here if and when this happens.
The engineering team is still working on stabilizing the performance of the timeseries and sequences service. There is rate limiting in place to reduce the load on the system. End users will get a 429 response code to their requests if their request rate exceeds the rate limits. We are adding more resources to the backend systems, but we are not able to lift the rate limits before the processing of the backlog is complete. It will still be a few hours.
The storage backend has now recovered completely and is running with the desired number of replicas. Risk for dataloss is no longer a concern in this incident. There is a processing backlog that now needs to be addressed. Cognite engineering is working on an ussue with query performance degradation related to high load.
The engineering team has fixed the problems related to the replication in the backend database. We are currently running with a normal level of resiliency. But we have still not lifted the rate limiting as we want to observe the system for a while longer before opening up for full load on the system.
The engineering team is still working on resolving this incident. We have had two low-level storage failures in the storage backend. There is redundancy in the system, but not all replicas are fully operational Cognite is now bringing up a restore cluster to mitigate the chances of data loss. Incoming traffic is still being rate limited to protect the service and the storage backend from too high a load during the work on containing and eradicating the incident.
The engineering team is continuing to investigate how to improve the performance of the backend for time series and sequences. To prevent data loss, the team has configured rate limiting for the services. Users will see 429 https responses if these new rate limits are exceeded.
Cognite Engineering is working on an incident where the backend datastores for timeseries and sequences have performance problems that results in a need to throttle incoming load and from time to time return 5xx responses due to system overload. The engineering team is working on improving the storage system's performance. A new update will be posted when end users experience is believed to change.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 4484 services available
Integrations with