Outage in Cognite Service

Service incident caused by ongoing issue in Microsoft Azure

Resolved Minor
January 25, 2023 - Started almost 2 years ago - Lasted about 2 hours
Official incident page

Need to monitor Cognite Service outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Cognite Service, and never miss an outage again.
Start Free Trial

Outage Details

Cognite Engineering is working on an issue caused by a Service Incident in Azure. Microsoft has resolved the incident related to networking in Azure, and Cognite Engingeering is monitoring our systems to ensure that services are recovering.
Latest Updates ( sorted recent to last )
RESOLVED almost 2 years ago - at 01/25/2023 01:34PM

Cognite Engineering has concluded that the services now are healthy after the Azure service incident impacting network functionality in most of our Azure subscriptions. The incident is now resolved.

###################################################
For reference, this is the information Micorsoft posted about the incident (copied from https://status.azure.com/en-gb/status/history/):

Azure Networking - Multiple regions - Mitigated (Tracking ID VSG1-B90)
Summary of Impact: Between 07:05 UTC and 09:45 UTC on 25 January 2023, customers experienced issues with networking connectivity, manifesting as network latency and/or timeouts when attempting to connect to Azure resources in Public Azure regions, as well as other Microsoft services including M365 and PowerBI.

Preliminary Root Cause: We determined that a change made to the Microsoft Wide Area Network (WAN) impacted connectivity between clients on the internet to Azure, connectivity between services within regions, as well as ExpressRoute connections.

Mitigation: We identified a recent change to WAN as the underlying cause and have rolled back this change. Networking telemetry shows recovery from 09:00 UTC onwards across all regions and services, with the final networking equipment recovering at 09:35 UTC. Most impacted Microsoft services automatically recovered once network connectivity was restored, and we worked to recover the remaining impacted services.

Next Steps: We will follow up in 3 days with a preliminary Post Incident Report (PIR), which will cover the initial root cause and repair items. We'll follow that up 14 days later with a final PIR where we will share a deep dive into the incident.

You can stay informed about Azure service issues, maintenance events, or advisories by creating custom service health alerts (https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation) and you will be notified via your preferred communication channel(s).
#######################################################

IDENTIFIED almost 2 years ago - at 01/25/2023 11:35AM

Cognite Engineering is working on an issue caused by a Service Incident in Azure. Microsoft has resolved the incident related to networking in Azure, and Cognite Engingeering is monitoring our systems to ensure that services are recovering.

Be the first to know when Cognite Service and other third-party services go down

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 3278 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook

Setup in 5 minutes or less

How much time you'll save your team, by having the outages information close to them?

14-day free trial · No credit card required · Cancel anytime