Architecture 8 min read

Webhook Buffering: Never Lose an Event During Downtime

Downtime happens. Learn how HookWatch's buffer service captures every webhook during outages and replays them automatically when your services recover.

H

HookWatch Team

February 13, 2026

Your server is down for a deploy. A database migration takes longer than expected. A cloud region has an outage. During those minutes (or hours), webhook providers keep sending events—and if nobody is listening, those events are gone forever.

HookWatch's buffer service solves this by sitting between the internet and your webhook receiver, capturing every request during downtime and replaying them when your services come back online.

The Problem with Webhook Downtime

Webhook providers typically retry failed deliveries, but with significant limitations:

ProviderRetry WindowMax RetriesBehavior After
Stripe72 hours16Event marked failed
GitHub24 hours3Webhook disabled
Shopify48 hours19Webhook removed
Twilio24 hours4Event dropped

Even with retries, you face problems:

  • Out-of-order delivery: Retried events arrive mixed with new ones
  • Webhook disabling: Some providers disable your endpoint after repeated failures
  • Gap in data: Events during the retry window may still be lost
  • Thundering herd: When you come back online, all retries hit at once

How HookWatch Buffering Works

The buffer service operates as a transparent proxy with three modes:

Code
                    ┌──────────────┐
  Incoming          │              │         ┌──────────┐
  Webhooks  ──────→ │   Buffer     │ ──────→ │ Webhook  │
                    │   Service    │         │ Service  │
                    │              │         └──────────┘
                    └──────┬───────┘
                           │
                    ┌──────▼───────┐
                    │    Redis     │
                    │   Stream     │
                    └──────────────┘

Mode 1: Proxy (Normal Operation)

When everything is healthy, the buffer acts as a transparent pass-through:

Code
Request → Buffer → Webhook Service → Response
            │
            └→ (passes through unchanged)

In proxy mode, the buffer adds negligible latency. Requests flow directly to the webhook service, and responses are returned to the caller. Your webhook providers see normal 200 responses.

Mode 2: Buffer (During Downtime)

When the webhook service is unhealthy, the buffer switches to capture mode:

Code
Request → Buffer → Redis Stream
            │
            └→ Returns 200 (accepted)

Every incoming request is stored in a Redis Stream with full fidelity:

  • Complete HTTP headers
  • Request body
  • HTTP method and path
  • Timestamp
  • Source IP

The caller receives a 200 response immediately, so webhook providers never see a failure. No retries are triggered, no endpoints get disabled, no events are lost.

Mode 3: Drain (Recovery)

When services come back online, the buffer replays stored requests:

Code
Redis Stream → Buffer → Webhook Service
                 │
                 └→ In original order, at controlled rate

Key characteristics of the drain process:

  • Ordered replay: Events are replayed in the exact order they were received
  • Rate limiting: Requests are replayed at a controlled rate to avoid overwhelming the recovering service
  • Delivery confirmation: Each replayed request is verified as delivered before moving to the next
  • Seamless transition: New incoming requests continue to flow through normally during drain

Health Checking

The buffer continuously monitors the webhook service health:

Go
// Simplified health check logic
func (b *Buffer) healthCheck() {
    resp, err := http.Get(b.webhookServiceURL + "/health")

    if err != nil || resp.StatusCode != 200 {
        b.setMode(ModeBuffer)
    } else if b.mode == ModeBuffer {
        b.setMode(ModeDrain)
    } else {
        b.setMode(ModeProxy)
    }
}

Health checks run every few seconds. The transition between modes is automatic—no manual intervention required.

Architecture Benefits

Zero-Loss Guarantee

Because the buffer returns 200 to callers during downtime, webhook providers believe the delivery succeeded. There's no retry storm, no disabled endpoints, and no lost events.

Deployment Without Fear

Deploy your services confidently knowing that webhooks received during the deploy window will be captured and replayed:

Code
Timeline:
  14:00  Start deployment
  14:00  Buffer detects unhealthy → switches to buffer mode
  14:00  Incoming webhooks stored in Redis
  14:03  Deployment completes
  14:03  Buffer detects healthy → switches to drain mode
  14:03  Buffered webhooks replayed in order
  14:04  All caught up → switches to proxy mode

Ordered Replay

Unlike provider retries that can arrive out of order, the buffer replays events in the exact sequence they were received. This is critical for workflows where event order matters—like payment processing or state machine transitions.

Real-World Scenarios

Planned Maintenance

Schedule maintenance without worrying about webhook gaps:

Bash
# Your deploy script doesn't need to change
./deploy.sh

# The buffer handles everything automatically:
# 1. Detects your service going down
# 2. Captures incoming webhooks
# 3. Detects your service coming back
# 4. Replays captured webhooks

Unexpected Outages

When things go wrong unexpectedly, the buffer has your back:

Code
Scenario: Database connection pool exhausted
  - Webhook service starts returning 503
  - Buffer detects failures, switches to buffer mode
  - All incoming webhooks captured safely
  - You fix the connection pool issue
  - Buffer replays 847 captured webhooks
  - Zero events lost

Multi-Region Failover

In a multi-region setup, the buffer can capture traffic while DNS failover propagates:

Code
Region A goes down
  → Buffer captures webhooks (0-60 seconds)
  → DNS failover completes
  → Buffer drains to recovered service
  → No events lost during failover window

Comparison: With vs Without Buffering

ScenarioWithout BufferWith Buffer
5-min deploy12 events lost0 events lost
30-min outage150+ events lost, endpoints disabled0 events lost
Database issueCascading failures, retry stormsSmooth recovery
Region failoverEvents lost during DNS propagationFull capture

Getting Started

HookWatch buffering works out of the box. When you route webhooks through HookWatch, the buffer service sits in front of the webhook receiver automatically.

There's nothing to configure—just point your webhook providers to your HookWatch URL and deploy with confidence. Every event is captured, every event is delivered.

Tags: reliabilitybufferingarchitecturedowntimewebhooks

Share this article

Ready to try HookWatch?

Start monitoring your webhooks in minutes. No credit card required.

Start Free Today