Never miss when your external dependencies are down. Instant notifications when there are outages.
Resolved Minor DNS is not resolving in US-EAST
This is a DNSimple issue, unfortunately: https://dnsimple.statuspage.io/incidents/bcnzxs0mgz21=
Resolved Minor General availability outage
For around an hour this morning, Buttondown had significantly degraded availability. ## What happened? New hosts refused to spin up and were "correctly" throwing 500s for around 30% of requests (this was only impacting hosts that were automatically cycling in and out, which is why it wasn't all requests.) ## Why did this happen? I'm using an undocumented Notion API to power documentation search, and the token that I was using to power that API expired in a way that I was not defensively programming against. This meant that each time the server tried to restart it would hit the API, fall over, and then pass that failure onto the client. As soon as this happened widespread enough, I got an alert for it... but I was out on a run. As soon as I got back, I hit the circuit breaker for that codepath and things got back to normal. ## Why won't this happen again? That circuit breaker is gonna stay off for a little, but I plan on moving all of that compilation to a build-time step anyway, ...
Resolved Minor General availability outage
For around five hours (meaning the early morning of 11/13, Pacific time) Buttondown's availability was heavily degraded. ## What happened? Around 50-70% of requests timed out. It wasn't _quite_ a complete DDOS, but essentially so. ## Why did this happen? This is... actually fairly silly, as far as these things go. An old third-party log handler that Buttondown was using shut off access at the logdrain I was using. (This is a totally reasonable thing to do!) _Unfortunately_, that clobbered a huge amount of the requests being served, to the point where all the active dynos on my infrastructure were busy complaining and throwing errors because they couldn't emit logs. The irony of this does not escape me. ## Well, why did it take so long to fix? I was asleep. No, really! That's the reason. I've got two thresholds for Buttondown outages: 1. The server is down for a little, which texts me. 2. The server is hard down for all requests, which calls and pages me. This was an exceptiona...
Resolved Minor Broken tracking links
I tried to switch over to SSL for our tracking links and it looks like something got awry. I need to follow up with CloudFlare as to what the issue was (it was likely a misconfiguration on my end, in all honesty, but hard to say) — but for around two hours all tracked links were broken because they pointed to https and not http. I've reached out to the folks who have sent outbound emails during that time, and am in the process of backfilling the transactional emails (such as subscriber confirmations) that were sent during that window!
Resolved Minor 500s on Frontend Application
GitHub is running into [issues](https://www.githubstatus.com/incidents/80d0cs6kpsps) and Buttondown's CI pipeline wasn't running properly, leading to a feature branch getting deployed to production which caused breakages for a few minutes.
"I spend 2 hours trying to solve an issue and then realize it's due to an [EXTERNAL SERVICE] outage"
Every engineer at some point in time
No more frantic searching for the source of the problem.
All of your service statuses in one place
Check the status of all your services in one place. No more going to each of the status pages and managing them individually.
Notifications of incidents in real time
We monitor 24 hours a day, 7 days a week and will notify you if there is an incident. No more wasting time trying to figure out why something isn't working.
Notifications in your favorite channel
Get instant notifications in your email, Slack, or Discord.
Keep track of scheduled maintenance
Never again be caught off guard by unexpected maintenance from your services. A feed of the next scheduled maintenances is available.
Set the notification level for each service
Configure which notifications you want to receive from each service. You can choose to receive notifications for all incidents, only critical incidents, or just display them on the dashboard.
Integrate with your current workflows
Using Zapier or Webhooks, you can easily integrate notifications into your processes.
Receive a Weekly Digest
Every Monday, you'll receive a weekly summary of what happened the previous week as well as the maintenance schedule for the following week.
Create one profile for each of your teams. One Dashboard and specific Notifications settings for each team.
Only receive notifications for specific components
Filter notifications by service components. You can opt to receive notifications only when a specific component is affected.
Add another dimension (external systems) to your monitoring data and complement it with the external factors. Monitor the services your company relies on for the development experience. Don't waste time looking elsewhere, when it's an external outage.
Know before your clients tell you. Anticipate possible issues and make the necessary arrangements. Understand if your business is being impacted by outages in external services.
One of your competitors is down? Maybe a good time to spread the word about your service.
all your essential services
Start with a trial account that will allow you to try and monitor up to 30 services for 14 days.
There are 1622 services to choose from, and we're adding more every week.
You can get notifications by email, Slack, and Discord. You can also use Zapier or Webhooks to build your workflows.
Increase the productivity and efficiency of your team. Enable monitoring for your services, and start receiving real-time alerts when your services have outages.Start today for FREE