Resolved Slowdown in activity monitoring and background tasks such as file copies
AWS health checks began failing this morning for a fraction of our services processing activity monitoring and background file transfer (file copies). This led to a slowdown in that processing for 1-2 hours before our alert thresholds were triggered, which has now been resolved. We are tightening our alert thresholds to avoid as long a delay until this situation is resolved in the future.
Resolved Google Drive Activity API v1 shut down
Google has now shut down the deprecated Drive Activity API v1. Most customers should not be affected. Customers with applications that continue to request v1 Scopes (as documented at https://developers.google.com/drive/activity/v2/migrating#authorization) should update their OAuth keys to the Drive Activity API v2 scopes instead, and ensure the "Drive Activity v2" box is checked at https://developers.kloudless.com/applications/*/credentials (the default).
Resolved Some Office 365 accounts may be required to re-authenticate due to a recent incident in Office 365
Some customers' authentication tokens may have been invalidated as a side effect of the recent Office 365 downtime. API requests to these accounts would return 403 or 401 errors beginning on early Sept 29 morning, UTC. Please request users to check their historical service health at https://portal.office.com/adminportal/home?#/servicehealth for further information, including the scope, impact, and timeframe of downtime.
Resolved G Suite (Admin) Activity Monitoring encountered intermittent errors accessing the Google API
The Google Discovery API unexpectedly began returning malformed identifiers in a portion of API responses causing Google's SDKs to reject the data received. The impacted URL, https://www.googleapis.com/discovery/v1/apis/admin/reports_v1/rest, caused Google's SDK to cache an incorrect schema, requiring the cache to be cleared manually for the impacted workers (a minority of our cluster). This issue only impacted retrieving changes and publishing webhooks for Admin Google Drive accounts that monitored org-wide activity. This issue is now resolved and has been escalated to the G Suite team to note the concern with errors of this nature.
Resolved Intermittent API request failures for some requests
An increase in latency for API requests due to network bottlenecks caused a backlog of requests in our web tier, resulting in API requests timing out. This backlog resulted in a further slowdown that escalated for a period of 30-45 minutes until a majority of API requests were impacted, prior to being resolved by our on-call staff. We are reviewing the incident and determining mechanisms to remediate network bottlenecks in an expeditious manner in the future.
Resolved Google Calendar API rate-limiting most requests
The Google Calendar API is rate-limiting the majority of requests at the moment regardless of whether custom OAuth credentials are used. We are looking into the cause and escalating this issue to Google.
Monitoring 764 services and adding more every week
IsDown collects status data from services to help you be ahead of the game.