Outage in Netdata

Agent connectivity disruption

Resolved Minor
December 14, 2022 - Started almost 3 years ago - Lasted 1 day
Official incident page

Need to monitor Netdata outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Netdata, and never miss an outage again.
Start Free Trial

Outage Details

We are seeing an increasing number of Agents that cannot (properly) connect to Cloud. We are investigating the cause, but initial indications are that it may be related to the latest nightly release of the Agent (version 1.37.0-48-nightly).
Components affected
Netdata Agent-Cloud Link (ACLK)
Latest Updates ( sorted recent to last )
RESOLVED almost 3 years ago - at 12/15/2022 06:58AM

As we see the number of connected agents go back to expected levels, and the number of agents running the previous nightly going down, we consider this incident resolved.

MONITORING almost 3 years ago - at 12/14/2022 07:02PM

The new build (1.37.0-55) has completed for most platforms. Please follow the instructions at https://learn.netdata.cloud/docs/agent/packaging/installer/update if you are on the affected version (1.37.0-48) and want to upgrade your agents manually. If you have automatic updates configured, you can also wait for the update to be done during your night.

We will be monitoring the progress of Agents as they reconnect.

IDENTIFIED almost 3 years ago - at 12/14/2022 05:17PM

The new build (1.37.0-55) has been triggered and we will post an update when it is ready. We will include instructions on how to update manually, or you can wait until the auto-upgrade happens during your night.

Note:
* If you are running a nightly build older than 1.37.0-48, you are not affected and no action is required.
* If you are running a stable build, you are not affected and no action is required. However, we do strongly recommend upgrading to 1.37.1 because of two security vulnerabilities in older versions.

IDENTIFIED almost 3 years ago - at 12/14/2022 03:56PM

We have identified the offending change in the Agent.

Only the latest nightly build (1.37.0-48-nightly) of the Agent is affected. The problem only occurs if the Agent tries to reconnect after having lost its first connection to Cloud. This means that if you restart your agent, the problem is avoided until its connection to Cloud drops.

We will issue a new nightly build that removes the offending change.

INVESTIGATING almost 3 years ago - at 12/14/2022 02:38PM

We are able to reproduce the issue and are attempting to pinpoint the cause.

INVESTIGATING almost 3 years ago - at 12/14/2022 01:02PM

We are seeing an increasing number of Agents that cannot (properly) connect to Cloud. We are investigating the cause, but initial indications are that it may be related to the latest nightly release of the Agent (version 1.37.0-48-nightly).

Be the First to Know When Vendors Go Down

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4522 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook