One place to monitor all your cloud vendors. Get instant alerts when an outage is detected.
Our Kafka cluster is still operational and we are continuing to work towards restoring full reliability. We will resume work during business hours to restore full Activity Log functionality. We will continue to update as work progresses.
The Kafka cluster is still operational, but is not back to full capacity and resiliency. Work will continue tomorrow to ensure sufficient capacity, as well as during business hours to resume full Activity Log functionality. We will provide more updates tomorrow and as-needed if there are any changes to availability.
We have gotten the Kafka cluster to a stable-enough state, however it will still require remediation before the Activity Log is fully functional and it is back to a level of service sustainable until business hours. We will provide another update in 2 hours or as the situation changes.
Ingest has been fully re-enabled, and SLOs and Triggers are now up-to-date.
We have resumed ingest to api.eu1.honeycomb.io as we were able to restore partial service to the Kafka cluster.
We are continuing to investigate issues with the Kafka cluster. We will continue providing updates every 2 hours unless there are significant changes.
We are continuing to investigate issues with the Kafka cluster.
External API access is restored, event ingestion is still impacted
We have temporarily disabled event ingestion for the EU Region, in service of restoring full functionality to our EU Kafka fleet. External API access will also be disabled. Additionally, during the ongoing outage, our Service Level Objectives feature has been impacted and down since 12:30 PM Pacific time on Friday, December 5th. SLO data will not be correct until our systems catch up and we rebuild the cache
Automated systems are still working to catch up and Activity Log will remain offline until that that completes. We will post another update Saturday whether or not the Activity Log outage has been remediated.
Ingest, querying, SLOs, and Trigger alerting is back to normal. Activity Log is still impacted, we have identified the cause and are working to resolve. We are still evaluating the scope of the outage for ingest and expect to have a full answer for that posted during US business hours on Monday.
We will post at least one more update this evening.
We are continuing the investigation after business hours to stabilize ingest and determine what work will be needed to fully recover. Known impact will be updated here as the situation changes. We will post at least one more update this evening.
We have identified that 0.23% of datasets are fully affected, and a larger percentage are seeing intermittent ingestion and query failures (500s at the API level). We are also investigating a replication error in our ingestion pipeline.
A subset of customer environments may see higher than usual error rates when sending events to api.eu1.honeycomb.io, and notifications for SLOs and Triggers may be delayed for that subset. We are continuing to investigate the issue.
We are currently investigating this issue.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 4600 services available
Integrations with