Use cases
Software Products E-commerce MSPs Schools Development & Marketing DevOps Agencies Help Desk
Company
Internet Status Blog Pricing Log in Get started free

The Outage Communication Template Engineering Teams Actually Use

Published at May 16, 2025.
The Outage Communication Template Engineering Teams Actually Use

TL;DR: Outage communication is a skill most teams treat as an afterthought until 3 AM when they're writing their first status update under pressure. This post gives you the exact templates and decision framework to communicate confidently during any incident, from initial acknowledgment through post-mortem, without making the situation worse.

Why Most Outage Communication Fails

The problem isn't that teams don't care. It's that outage communication requires two things that are in direct conflict during an active incident: moving fast and getting the words exactly right.

Under pressure, most engineers default to one of three failure modes:

  • Radio silence — "We'll communicate once we know what's happening." Customers fill the silence with worst-case assumptions and flood your support queue.
  • Over-promising — "Should be resolved in 30 minutes." Thirty minutes later, you're explaining why you need another 30 minutes. Every missed ETA destroys trust.
  • Technical jargon — Updates written for engineers, not customers. "Database connection pool exhaustion affecting write operations" means nothing to a user trying to complete a checkout.

The fix isn't better writers. It's better templates that remove the cognitive load of finding the right words while your pager is screaming.

The Hard Truth: Your customers are more forgiving of outages than you think. What they don't forgive is silence, missed promises, and finding out about problems from Twitter instead of from you. The quality of your communication during an outage affects retention more than the outage itself.

The Three-Phase Communication Framework

Every outage has three distinct communication phases, each with a different goal:

Phase 1: Acknowledgment — "We know something is wrong."
Phase 2: Updates — "Here's what we know and what we're doing."
Phase 3: Resolution — "Here's what happened and what we're doing about it."

Most teams handle Phase 3 reasonably well. It's Phases 1 and 2 that kill them, specifically, starting Phase 1 too late and providing too few Phase 2 updates.

The Templates

Every outage communication template your team will ever need fits into one of three phases.

Phase 1: Initial Acknowledgment (T+0 to T+15 minutes)

Goal: Get something out fast. Silence is worse than uncertainty.

Template A — Known Impact, Unknown Cause:

We're currently investigating an issue affecting [affected service/feature]. Some users may experience [specific impact]. We're working to identify the cause and will provide an update within [30/60] minutes.

Current status: Investigating

Template B — Known Cause, Working on Fix:

We've identified an issue with [technical component] that is causing [customer-visible impact]. Our team is actively working on a fix. We expect to have an update by [specific time].

Current status: Identified

What to fill in:

  • Affected service/feature — Be specific. "Checkout" is better than "our platform." "API endpoints in us-east-1" is better than "some APIs."
  • Specific impact — What can customers actually not do? "Unable to complete purchases" or "Slow dashboard load times" — not "degraded performance."
  • Time commitment — Always give a next-update time. Never say "we'll update you when we know more." That's radio silence with extra steps.

Pro-Tip: The first update should go out within 15 minutes of incident declaration, even if all you can say is "we're aware and investigating." In our experience, customers who see an update within 15 minutes are far less likely to contact support. The update doesn't need to be useful. It just needs to prove you're watching.

Phase 2: Ongoing Updates (Every 30-60 minutes)

Goal: Keep customers from filling silence with worst-case assumptions.

Template — Progress Update:

Update [number] — [time]

We're continuing to work on [brief description of issue]. [What your team has done / found since last update]. [Current state: e.g., "We've deployed a fix to our staging environment and are running validation tests."]

We expect to provide our next update by [specific time] or sooner if the issue is resolved.

Current status: [Investigating / Identified / Monitoring]

Key rules for Phase 2 updates:

  • Never miss a scheduled update time — Even if there's nothing new to say. Post "We're still actively working on this, no change to report, next update at [time]." The update is about trust, not information.
  • Never give a resolution ETA unless you're confident — "Resolved in 30 minutes" that becomes "resolved in 2 hours" is worse than no ETA. Only commit to update times, not resolution times.
  • Show progress, not just status — "We've deployed a fix and are monitoring" is more reassuring than "Still investigating." Even partial progress should be stated.

Phase 3: Resolution

Template — Resolution Notice:

Resolved — [time]

The issue affecting [service/feature] has been resolved as of [time]. All services are operating normally.

What happened: [1-2 sentence plain-language explanation]

Impact duration: [Start time] to [End time] - [X hours/minutes]

What we're doing to prevent recurrence: [Brief action items]

We apologize for the disruption. If you're still experiencing issues, please contact [support link].

What to avoid in resolution notices:

  • Vague root causes — "An infrastructure issue" tells customers nothing. "A misconfigured load balancer rule" is honest and specific.
  • Empty apologies — "We're sorry for any inconvenience" is the corporate equivalent of "thoughts and prayers." Pair every apology with a concrete action.
  • Premature resolution — Don't post resolved until you've monitored for at least 15 minutes post-fix. Posting resolved and then having to reopen is a trust catastrophe.

Vendor Outage Communication: The Special Case

Here's the scenario most communication templates don't cover: the outage isn't your fault. Your vendor is down.

This happens more than most teams admit. AWS goes down and takes your payment processing with it. Stripe has an incident and your checkout breaks. Your CDN has a degradation and your app slows to a crawl.

The Hard Truth: Your customers don't care whose fault it is. They care that your product isn't working. "AWS is having an issue" is legitimate context, but only if you communicate it proactively, not as an excuse after they've already contacted support.

Template — Vendor-Caused Outage:

We're currently experiencing issues with [feature/service] due to an ongoing incident at one of our infrastructure providers. We're monitoring the situation closely and will update as it progresses.

You can track the provider's incident status here: [vendor status page link]

Current status: Monitoring

What this does:

  • Honest framing — Explains the situation without sounding defensive
  • Active presence — Shows you're watching even though the fix isn't in your hands
  • Customer autonomy — Gives customers a way to track progress independently
  • Accurate expectations — Sets realistic timelines (this resolves when the vendor resolves it)

The key to executing this well is knowing about vendor incidents before customers report them to you. IsDown monitors 6,000+ vendor status pages and sends alerts to your IsDown's Slack integration or PagerDuty integration the moment a vendor updates their status page, often faster than their own notification reaches you. When you know first, you communicate first.

Tone and Language Guide

Outage communication has a tone register that's distinct from normal product writing. It's not casual, not formal. It's operational.

Do:

  • Use present tense and active voice: "We're investigating" not "An investigation is being conducted"
  • Be specific about what is and isn't affected: "The issue is isolated to checkout; browsing and account access are unaffected"
  • Name your team: "Our engineering team" feels more accountable than "the team"
  • Give concrete times: "by 14:00 UTC" not "in about an hour"

Don't:

  • Use weasel words: "some users may be experiencing" when all users are affected
  • Promise what you can't control: "This will never happen again"
  • Explain technical root causes in customer-facing updates (save that for the postmortem)
  • Use passive voice to avoid accountability: "Errors were encountered" instead of "Our API is returning errors"

Communication Channels: Which to Use When

ChannelWhen to UseAudience
Status pageAlways — this is the source of truthAll customers, automated monitoring tools
EmailMajor incidents (>30 min duration) or data impactCustomers who don't check your status page
In-app bannerWhen the affected feature is in active useUsers currently in your product
Twitter/socialWhen customers are already discussing it publiclyPublic + customers who follow you
Enterprise customer DMsHigh-value accounts during any significant incidentYour biggest customers

The status page should always be updated first. It's the canonical record. Every other channel references it.

Frequently Asked Questions

How quickly should I post the first incident update?

Within 15 minutes of declaring an incident. If your team doesn't have a formal incident declaration process, the trigger should be: "If this is still broken in 15 minutes, we post." You don't need to know the cause. You just need to prove you're watching.

What do I say when I don't know what's wrong yet?

Post the first template above: "We're investigating an issue affecting [X]. Some users may experience [Y]. We'll update by [time]." You don't need to know the cause to communicate that you know there's a problem and you're working on it. That's enough for Phase 1.

Should I explain the technical root cause in customer-facing updates?

In most cases, no. Not during the incident. Technical root causes belong in the post-incident review or postmortem, which you can publish 24-48 hours later for significant incidents. During the incident, customers need to know what they can and can't do, not why the database is misbehaving.

How do I handle an outage caused by a vendor we depend on?

Use the vendor-caused outage template above. Be transparent that the issue is with an external provider, link to their status page, and commit to update times. Don't hide behind "infrastructure issues". Customers appreciate honesty about third-party dependencies. The key is being the one to tell them, not making them discover it themselves.

What's the difference between a status page update and a postmortem?

A status page update is real-time operational communication: what's happening now, what you're doing about it. A postmortem is a retrospective analysis: what happened, why, what you're changing. Status updates go out during the incident, every 30-60 minutes. Postmortems go out 24-72 hours after resolution for significant incidents. Both matter, but they serve different audiences and different timings.

How do we manage communication across multiple channels without everything getting out of sync?

Designate one person as the incident communicator whose only job during the incident is communication: not debugging, not fixing. They own the status page updates, the Slack updates, the customer emails. Everyone else focuses on resolution. This separation prevents the two most common errors: communication that's technically accurate but incomprehensible to customers, and technical teams getting distracted by communication tasks when they should be fixing the problem.

Nuno Tomas Nuno Tomas Founder of IsDown

The Status Page Aggregator with Early Outage Detection

Unified vendor dashboard

Early Outage Detection

Stop the Support Flood

14-day free trial • No credit card required

Related articles

Never again lose time looking in the wrong place

14-day free trial · No credit card required · No code required