Outage in Buttondown

General availability outage

Resolved Minor
November 29, 2020 - Started over 3 years ago - Lasted over 1 year
Official incident page

Need to monitor Buttondown outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Buttondown, and never miss an outage again.
Start Free Trial

Outage Details

For around an hour this morning, Buttondown had significantly degraded availability. ## What happened? New hosts refused to spin up and were "correctly" throwing 500s for around 30% of requests (this was only impacting hosts that were automatically cycling in and out, which is why it wasn't all requests.) ## Why did this happen? I'm using an undocumented Notion API to power documentation search, and the token that I was using to power that API expired in a way that I was not defensively programming against. This meant that each time the server tried to restart it would hit the API, fall over, and then pass that failure onto the client. As soon as this happened widespread enough, I got an alert for it... but I was out on a run. As soon as I got back, I hit the circuit breaker for that codepath and things got back to normal. ## Why won't this happen again? That circuit breaker is gonna stay off for a little, but I plan on moving all of that compilation to a build-time step anyway, removing the Notion codepath from the critical path of the application! ## Any questions? Email me: justin@buttondown.email
Latest Updates ( sorted recent to last )
RESOLVED over 3 years ago - at 11/30/2020 01:22AM

For around an hour this morning, Buttondown had significantly degraded availability.

## What happened?
New hosts refused to spin up and were "correctly" throwing 500s for around 30% of requests (this was only impacting hosts that were automatically cycling in and out, which is why it wasn't all requests.)

## Why did this happen?
I'm using an undocumented Notion API to power documentation search, and the token that I was using to power that API expired in a way that I was not defensively programming against. This meant that each time the server tried to restart it would hit the API, fall over, and then pass that failure onto the client. As soon as this happened widespread enough, I got an alert for it... but I was out on a run. As soon as I got back, I hit the circuit breaker for that codepath and things got back to normal.

## Why won't this happen again?
That circuit breaker is gonna stay off for a little, but I plan on moving all of that compilation to a build-time step anyway, removing the Notion codepath from the critical path of the application!

## Any questions?
Email me: justin@buttondown.email

Latest Buttondown outages

Backend processing down - about 1 month ago
API is down - 6 months ago
Webface down - 12 months ago
DNS is not resolving in US-EAST - over 3 years ago

Easily monitor Buttondown and all your third-party vendors

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 3181 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook

Setup in 5 minutes or less

How much time you'll save your team, by having the outages information close to them?

14-day free trial · No credit card required · Cancel anytime