Outage in Workspot

Failure in Azure East US and East US2 region

Minor
September 10, 2025 - Started 2 days ago
Official incident page

Need to monitor Workspot outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Workspot, and never miss an outage again.
Start Free Trial

Outage Details

We are aware that some customers are experiencing issues with VM provisioning and resuming operations in the East US and East US 2 Azure regions. Our team has already engaged Microsoft Azure Support with a Severity 1 ticket. According to Microsoft, the issue is related to unhealthy dependencies that are causing failures. They are actively investigating to determine the root cause and identify possible mitigation steps. We will share further updates as soon as Microsoft provides more information.
Components affected
Workspot Cloud
Latest Updates ( sorted recent to last )
MONITORING 1 day ago - at 09/11/2025 03:49PM

A fix has been implemented and we are monitoring the results.

IDENTIFIED 2 days ago - at 09/10/2025 10:26PM

As per Microsoft, this issue has been mitigated. Here is the incident summary, shared by Microsoft

SUMMARY OF IMPACT:
What happened?
Between 09:12 UTC and 18:50 UTC on 10 September 2025, a platform issue resulted in an impact to multiple Azure services in the East US 2 region, more specifically two zones (Az02 and Az03). Impacted customers may have experienced error notifications when performing service management operations - such as create, delete, update, scaling, start or stop - for resources hosted in this region. The primary impacted service affected was Virtual Machines or Virtual Machines Scale Sets, but this would have resulted in issues for services dependent upon such Compute resources, such as Azure Databricks, Azure Kubernetes Service, Azure Synapse Analytics, Backup, and Data Factory.
Customers that still see failed or unhealthy resources should attempt to update or redeploy the resource.

What do we know so far?
Our investigation identified that the issue impacting resource provisioning in East US 2 was linked to a failure in the platform component responsible for managing resource placement. The system is designed to recover quickly from transient issues, but in this case, the prolonged performance degradation caused recovery mechanisms themselves to become a source of instability.
The incident was primarily driven by a combination of platform recovery behavior and sustained performance degradation. While customer-generated load remained within expected limits, internal platform services began retrying failed operations aggressively when performance issues emerged. These retries, intended to support resilience, instead created a surge in internal system activity.

How did we respond?
• 09:12 UTC on 10 September 2025 – Customer impact began.
• 09:13 UTC on 10 September 2025– Our monitoring systems observed a rise in failure rates, triggering an alert and prompting our team to initiate an investigation.
• 12:08 UTC on 10 September 2025 – We identified unhealthy dependencies in core infrastructure components as initial contributing factors.
• 13:34 UTC on 10 September 2025 – Began mitigation efforts that included - Restarted critical service components to restore functionality, reroute workloads from affected infrastructure, initiated multiple recovery cycles for the impacted backend service, on recovery, internal workloads processed through backlogs to get to the current healthy state, and executed capacity operations to free up resources.
• 18:50 UTC on 10 September 2025 – After a period of monitoring to validate the health of services, we were confident that the control plane service was restored, and no further impact was observed to downstream services for this issue.

IDENTIFIED 2 days ago - at 09/10/2025 07:00PM

Below is the summary provided by Microsoft regarding the ongoing issue in the East US 2 region:

"Current Status:
We detected the issue through automated monitoring following a spike in failure rates. We have identified a performance issue in a core infrastructure component responsible for managing resource placement. This is causing delays and failures in virtual machine provisioning. The issue stems from severe transaction delays and high system load in two zones of the region.

Active Recovery Efforts:
• Zone 2 (Az02) - Gradually re-enabling traffic (~50%) to Zone 2 using controlled allocation strategies. This should be introducing more capacity for the region and allowing higher allocation success rates.
• Zone 3 (Az03) and Zone 1 (Az01) – Is recovered but due to partial traffic enabled for Zone 2, customers may still see allocation failures here. As Zone 2 traffic increases, allocation success should increase in all zones.
Note that the 'logical' zones used by each customer subscription may correspond to different physical zones - customers can use the Locations API to understand this mapping, to confirm which resources run in this physical AZ.
The next update will be provided within 60 minutes, or sooner if significant progress is made.
"

IDENTIFIED 2 days ago - at 09/10/2025 04:50PM

Below is the summary provided by Microsoft regarding the ongoing issue in the East US 2 region:

"Current Status:
We detected the issue through automated monitoring following a spike in failure rates. The root cause has been traced to a backend service responsible for managing resource placement, which is experiencing performance degradation. This has led to delays and failures in resource creation and management.

Our engineering teams have attempted several recovery actions, including restarting key service components and shifting workloads away from affected infrastructure. However, these efforts have not yet fully resolved the issue due to system-level constraints.

Mitigation Actions Taken:
• Restarted critical service components to restore functionality.
• Attempted to reroute workloads from affected infrastructure.
• Initiated multiple recovery cycles for the impacted backend service.

Active Recovery Efforts:
• Zone 3 (Az03) is showing signs of improvement after targeted recovery actions. System performance has stabilized and is being closely monitored.
• Zone 2 (Az02) is undergoing similar recovery steps. While new resource deployments remain restricted, existing resources are beginning to recover.
• We are working to redistribute workloads to healthier zones (Az01 and Az03). However, limited capacity in these zones is causing throttling and delays.
• In parallel, we are exploring emergency capacity expansion to alleviate resource constraints and accelerate recovery.

Note that the 'logical' zones used by each customer subscription may correspond to different physical zones - customers can use the Locations API to understand this mapping, to confirm which resources run in this physical Availability Zone(AZ).

Next Steps:
We continue to monitor all zones and prioritize recovery in Az02 to restore capacity in the region."

IDENTIFIED 2 days ago - at 09/10/2025 03:40PM

As per the latest update from Microsoft, they have observed that two of the three impacted zones have returned to a healthy state, and failure rates are now trending downward. Customers may see signs of recovery over time.

This failure issue is limited to the East US2 region.

IDENTIFIED 2 days ago - at 09/10/2025 02:43PM

We are continuing to work with Microsoft

IDENTIFIED 2 days ago - at 09/10/2025 02:43PM

We are aware that some customers are experiencing issues with VM provisioning and resuming operations in the East US and East US 2 Azure regions.

Our team has already engaged Microsoft Azure Support with a Severity 1 ticket. According to Microsoft, the issue is related to unhealthy dependencies that are causing failures. They are actively investigating to determine the root cause and identify possible mitigation steps.

We will share further updates as soon as Microsoft provides more information.

Burned by Vendor Downtime? Never Again with Our Status Page Aggregator

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4400 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook