Outage in Netdata

Startup issue in latest Agent nightly (1.40.0-6-nightly)

Resolved Minor
June 16, 2023 - Started over 1 year ago - Lasted about 12 hours
Official incident page

Need to monitor Netdata outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Netdata, and never miss an outage again.
Start Free Trial

Outage Details

We are currently investigating an issue with agent connectivity to the cloud.
Latest Updates ( sorted recent to last )
RESOLVED over 1 year ago - at 06/16/2023 05:58PM

All packages have been published. If your nodes are still on 1.40.0-6, please refer to the instructions to upgrade: https://learn.netdata.cloud/docs/maintaining/update-netdata-agents#updates-for-most-systems. We are now closing this incident, but please let us know if things are still not working on your nodes.

MONITORING over 1 year ago - at 06/16/2023 02:57PM

The source tarballs with the fix for native builds are now available. Packages for ARM systems are still building but should be fully published and available by 17:00 UTC at the latest.

MONITORING over 1 year ago - at 06/16/2023 02:42PM

The native packages for x86-based distributions have been published. The ARM ones are still building and should follow shortly, as well as the static builds. We're monitoring Netdata Cloud and the various social networking tools to monitor the outcome of the new builds.

IDENTIFIED over 1 year ago - at 06/16/2023 01:24PM

The fix has been merged, we've kicked off the build process for the packages. We will provide another update when the packages for the affected systems have been pushed.

IDENTIFIED over 1 year ago - at 06/16/2023 11:43AM

We have created a fix for this issue, which is a combination of making systemd not change the ownership and permissions the directories the Agent uses, and the Agent properly changing permissions recursively to recover from the effects of the bad version. As soon as we've tested the fix, and the packages have been built, we will trigger an explicit push to the nightlies repos.

IDENTIFIED over 1 year ago - at 06/16/2023 08:11AM

While we are working on a fix, which requires a new package to be built, we have developed a workaround. It requires downgrading the Agent to 1.40.0-2-nightly and fixing the permissions. For Debian based systems, this script should work, run as root: https://gist.github.com/ralphm/1326498c474aaacf0a12f9e569dac863

IDENTIFIED over 1 year ago - at 06/16/2023 06:53AM

Agents running the most recent nightly (1.40.0-6-nightly) fail to start on some platforms, because of a permissioning issue. We believe the culprit is this change: https://github.com/netdata/netdata/pull/14890, and are working on a fix. As this happens early on in the Agent, this affects Cloud and non-Cloud users alike.

INVESTIGATING over 1 year ago - at 06/16/2023 05:55AM

We are currently investigating an issue with agent connectivity to the cloud.

Start monitoring all your vendors in just 5 minutes

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 3278 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook

Setup in 5 minutes or less

How much time you'll save your team, by having the outages information close to them?

14-day free trial · No credit card required · Cancel anytime