Outage in Buttondown

General availability outage

Resolved Minor
November 13, 2020 - Started over 3 years ago - Lasted over 1 year
Official incident page

Need to monitor Buttondown outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Buttondown, and never miss an outage again.
Start Free Trial

Outage Details

For around five hours (meaning the early morning of 11/13, Pacific time) Buttondown's availability was heavily degraded. ## What happened? Around 50-70% of requests timed out. It wasn't _quite_ a complete DDOS, but essentially so. ## Why did this happen? This is... actually fairly silly, as far as these things go. An old third-party log handler that Buttondown was using shut off access at the logdrain I was using. (This is a totally reasonable thing to do!) _Unfortunately_, that clobbered a huge amount of the requests being served, to the point where all the active dynos on my infrastructure were busy complaining and throwing errors because they couldn't emit logs. The irony of this does not escape me. ## Well, why did it take so long to fix? I was asleep. No, really! That's the reason. I've got two thresholds for Buttondown outages: 1. The server is down for a little, which texts me. 2. The server is hard down for all requests, which calls and pages me. This was an exceptionally long bout of the former, which meant I woke up to like seventy outage texts but no outright pages. ## Why won't this happen again? First: I've upped (or lowered, depending on how you look at it) the threshold for what constitutes an outage. I made a lot of these alerts two years ago when Buttondown was a fraction of a fraction of its current size; thankfully, things are generally stable, but its still time to be more alert. Any non-trivial breakage of traffic pages little old me. Second: to fix the _actual_ issue, I'm spending some time this weekend messing around with the logging & error infrastructure Buttondown uses to more gracefully degrade. Have any questions? Email me: justin@buttondown.email
Latest Updates ( sorted recent to last )
RESOLVED over 3 years ago - at 11/14/2020 01:02AM

For around five hours (meaning the early morning of 11/13, Pacific time) Buttondown's availability was heavily degraded.

## What happened?
Around 50-70% of requests timed out. It wasn't _quite_ a complete DDOS, but essentially so.

## Why did this happen?
This is... actually fairly silly, as far as these things go. An old third-party log handler that Buttondown was using shut off access at the logdrain I was using. (This is a totally reasonable thing to do!) _Unfortunately_, that clobbered a huge amount of the requests being served, to the point where all the active dynos on my infrastructure were busy complaining and throwing errors because they couldn't emit logs. The irony of this does not escape me.

## Well, why did it take so long to fix?
I was asleep.

No, really! That's the reason. I've got two thresholds for Buttondown outages:

1. The server is down for a little, which texts me.
2. The server is hard down for all requests, which calls and pages me.

This was an exceptionally long bout of the former, which meant I woke up to like seventy outage texts but no outright pages.

## Why won't this happen again?
First: I've upped (or lowered, depending on how you look at it) the threshold for what constitutes an outage. I made a lot of these alerts two years ago when Buttondown was a fraction of a fraction of its current size; thankfully, things are generally stable, but its still time to be more alert. Any non-trivial breakage of traffic pages little old me.

Second: to fix the _actual_ issue, I'm spending some time this weekend messing around with the logging & error infrastructure Buttondown uses to more gracefully degrade.

Have any questions? Email me: justin@buttondown.email

Latest Buttondown outages

Outgoing emails are backed up - about 1 month ago
Backend processing down - 3 months ago
API is down - 7 months ago
Webface down - about 1 year ago

The easiest way to monitor Buttondown and all cloud vendors

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 3202 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook

Setup in 5 minutes or less

How much time you'll save your team, by having the outages information close to them?

14-day free trial · No credit card required · Cancel anytime