Outage in Bentley Systems

Microsoft Azure- Southeast Asia- Virtual Machines are down

Resolved Minor
March 19, 2024 - Started about 1 month ago - Lasted about 3 hours
Official incident page

Need to monitor Bentley Systems outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Bentley Systems, and never miss an outage again.
Start Free Trial

Outage Details

The Microsoft Azure team is currently investigating an issue with Virtual Machines being down. Some users may be having trouble or may experience connection failures when trying to access some Virtual Machines using certain features. Microsoft Team is working diligently to identify the root cause of the problem and implement a solution. We will provide an update as we learn more. In the meantime, we apologize for any inconvenience this may cause and appreciate your patience and understanding.
Latest Updates ( sorted recent to last )
RESOLVED about 1 month ago - at 03/19/2024 01:31PM

Microsoft Azure updated and resolved with the following:

Impacted region(s)
Southeast Asia
Impacted subscription(s)
Connection Center Production EUS (839f01a6-2994-46b7-b4a2-c52840d935c6)
Last update (2024-03-19T13:16:47.6046358Z)
What happened?

Between 09:51 UTC and 11:30 UTC on 19 Mar 2024, customers using Virtual Machines in Southeast Asia who may have experienced connection failures when trying to access some Virtual Machines hosted in the region

What do we know so far?

We determined that the cause of the incident was a sudden increase in traffic at a single storage scale unit, which resulted in degraded performance and increased latency for a subset of customers hosting their resources in the Southeast Asia region.

How did we respond?

09:48 UTC on 19 March 2024: Internal monitoring thresholds were met, alerting us to this issue and prompting us to start our investigation.
09:50 UTC on 19 March 2024: Our platform detected service errors and customers started experiencing impact.
Through investigation, we determined the spike in requests as the root cause of the customer experienced errors and latency. In response, the requests were automatically re-balanced to include additional clusters.
11:30 UTC on 19 March 2024: We confirmed that resources became healthy and failure rates had decreased to regular standards. After further monitoring, our telemetry confirmed the issue was mitigated and full-service functionality was restored.

What happens next?

Our team will be completing an internal retrospective to understand the incident in more detail. Once that is completed, generally within 14 days, we will publish a Post Incident Review (PIR) to all impacted customers.
To get notified when that happens, and/or to stay informed about future Azure service issues, make sure that you configure and maintain Azure Service Health alerts – these can trigger emails, SMS, push notifications, webhooks, and more: https://aka.ms/ash-alerts .
For more information on Post Incident Reviews, refer to https://aka.ms/AzurePIRs .
Finally, for broader guidance on preparing for cloud incidents, refer to https://aka.ms/incidentreadiness .

INVESTIGATING about 1 month ago - at 03/19/2024 11:51AM

The Virtual Machines have been restored to health but Microsoft continues to investigate connection failures and unexpected restarts of Virtual Machines in the region.

INVESTIGATING about 1 month ago - at 03/19/2024 11:03AM

The Microsoft Azure team is currently investigating an issue with Virtual Machines being down. Some users may be having trouble or may experience connection failures when trying to access some Virtual Machines using certain features.
Microsoft Team is working diligently to identify the root cause of the problem and implement a solution. We will provide an update as we learn more.
In the meantime, we apologize for any inconvenience this may cause and appreciate your patience and understanding.

The easiest way to monitor Bentley Systems and all cloud vendors

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 3154 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook

Setup in 5 minutes or less

How much time you'll save your team, by having the outages information close to them?

14-day free trial · No credit card required · Cancel anytime