Calculate how fast your error budget is being consumed. Determine burn rate, time to exhaustion, and set multi-window alert thresholds.
The error budget burn rate measures how quickly your service is consuming its allowed error budget. A burn rate of 1.0 means you're consuming the budget at exactly the expected pace — you'll exhaust it precisely at the end of the SLO window. A burn rate above 1.0 means you're consuming faster than sustainable, and below 1.0 means you have budget to spare.
This calculator takes your SLO, the total budget period, the elapsed time, and the budget consumed so far to compute the current burn rate and projected time to exhaustion. It also suggests multi-window alert thresholds following Google's recommended burn-rate alerting strategy.
Burn rate alerting is the gold standard for SLO-based monitoring. Rather than alerting on raw error rates (which cause alert fatigue), burn rate alerts fire only when the consumption trajectory threatens to exhaust the budget before the window resets. This gives SRE teams timely, actionable signals without excessive noise.
Raw error rate alerts are noisy and don't account for budget context. A brief spike may look alarming but barely dent the monthly budget. Burn rate alerts connect incidents to their actual SLO impact, ensuring on-call engineers respond to meaningful threats. This calculator helps you configure those thresholds correctly. Precise quantification supports capacity planning and performance budgeting, ensuring infrastructure investments are right-sized for both current workloads and projected future growth.
Burn Rate = (Budget Consumed / Elapsed Time) / (Total Budget / Total Period). Time to exhaustion = Remaining Budget / Current Consumption Rate.
Result: Burn rate: 1.39
With a 99.9% SLO over 30 days, the total budget is 43.2 minutes. After 10 days, 20 minutes consumed gives a burn rate of (20/10)/(43.2/30) = 1.39. At this rate, the budget will be exhausted in 11.6 days — before the 30-day window ends.
Burn rate is a normalized measure of error budget consumption speed. It answers the question: at this pace, when will we run out of budget? A burn rate of 1.0 over the full window means you'll exactly deplete the budget. Any sustained rate above 1.0 means the budget will be exhausted early.
The industry best practice is to use multiple alert tiers with different windows and burn rates. A fast-burn alert (14.4x over 1 hour) catches acute incidents that will deplete 2% of the monthly budget per hour. A slow-burn alert (1x over 3 days) catches chronic degradation that might otherwise go unnoticed.
Most monitoring platforms now support burn rate alerting natively. Prometheus has built-in recording rules for multi-window burn rate. Datadog, Google Cloud, and Grafana Cloud offer SLO-based alerting with configurable burn rate thresholds.
Burn rate tells you the speed; remaining budget tells you the amount. Together they give a complete picture. A high burn rate with a full budget is less urgent than a moderate burn rate with almost no budget remaining.
A burn rate measures how fast the error budget is being consumed relative to the expected pace. A burn rate of 2.0 means the budget is being consumed twice as fast as sustainable, and it will run out halfway through the SLO window.
Google recommends multiple alert tiers: 14.4x burn rate over 1 hour (page immediately), 6x over 6 hours (page), 3x over 1 day (ticket), and 1x over 3 days (warning). This provides progressive escalation based on severity.
Error rate is the raw percentage of failing requests. Burn rate normalizes this against your SLO and time window. A 1% error rate might be harmless for a 99% SLO or catastrophic for a 99.99% SLO — burn rate captures this context.
Multi-window alerting checks burn rate over both a long window (for trend) and a short window (for recency). An alert fires only if both windows exceed the threshold, reducing false positives from brief spikes or historical noise.
No. The burn rate is always zero or positive because errors can only accumulate, not un-occur. However, the effective rate can decrease if the service returns to normal operation, lowering the rolling-window average.
Identify the source of errors (deployment, infrastructure issue, external dependency) and remediate it. Roll back recent changes, scale resources, or enable fallbacks. Once the error source is resolved, the burn rate will decrease over the rolling window.