We identified a routing issue which is sending Copilot traffic to a portion of infrastructure that is unhealthy, resulting in the access and timeout errors. We're rerouting requests to healthy infrastructure and our telemetry is starting to show service health recovery.
CoPilot app and web service are down, showing 503 errors and blank screens. Users report inability to connect and full outages.
We've identified high utilization on the underlying infrastructure that the Copilot LLM APIs use and are applying mitigations. Additionally, at the Copilot service level, we're reviewing options to change routing paths, throttling rules and retry logic to allow the underlying infrastructure to recover.
Users report slow or non-functional copilot, stuck on 'Lining Things Up'. Likely due to an AI loop or service issue.