Webhook Error Handling Best Practices

Handling webhook errors correctly is the difference between a reliable integration and a fragile one. When your webhook endpoint mishandles errors — returning wrong status codes, processing too slowly, or failing silently — you lose events, trigger unnecessary retries, process duplicates, and miss critical business data. This guide covers every error handling pattern you need to build webhook consumers that are resilient, observable, and production-ready.
The Foundation: Respond First, Process Later
The single most important principle of webhook error handling is: acknowledge receipt immediately, then process asynchronously. Every other pattern in this guide builds on this foundation.
Why Synchronous Processing Fails
When you process a webhook synchronously (handling all business logic before returning a response), several things can go wrong:
// BAD: Synchronous processing — many failure modes
app.post('/webhook', async (req, res) => {
try {
const event = req.body;
// Each of these steps can fail or be slow:
await verifySignature(req); // 10ms
await lookupCustomer(event.data); // 200ms
await updateDatabase(event.data); // 500ms
await sendConfirmationEmail(event); // 2000ms
await notifySlackChannel(event); // 1000ms
await updateAnalytics(event); // 300ms
// Total: 4+ seconds — dangerously close to timeout
res.status(200).json({ received: true });
} catch (error) {
// ANY failure returns 500, triggering a full retry
res.status(500).json({ error: 'Processing failed' });
}
});
Problems with this approach:
- If the email service is slow, the entire webhook times out
- If analytics fails, a payment webhook gets retried — potentially charging twice
- The provider's delivery worker is blocked for 4+ seconds
- Any single failure causes the entire processing chain to fail
The Async Alternative
// GOOD: Acknowledge immediately, process asynchronously
app.post('/webhook',
express.raw({ type: 'application/json' }),
async (req, res) => {
try {
// Step 1: Verify signature (fast — must be synchronous)
const event = verifyAndParse(req);
// Step 2: Queue for processing (fast — just a write)
await eventQueue.add('webhook-processing', {
event,
receivedAt: Date.now()
});
// Step 3: Respond immediately
res.status(200).json({ received: true });
} catch (error) {
if (error.type === 'signature_invalid') {
res.status(401).json({ error: 'Invalid signature' });
} else {
res.status(500).json({ error: 'Failed to queue event' });
}
}
}
);
The endpoint does only two things: verify the signature and write the event to a queue. Everything else happens in a background worker.
The Queue-Then-Acknowledge Pattern
The queue-then-ack pattern is the gold standard for webhook processing. Here is a complete implementation:
Step 1: Receive and Queue
const { Queue, Worker } = require('bullmq');
const webhookQueue = new Queue('webhooks', {
connection: { host: 'localhost', port: 6379 }
});
app.post('/webhook',
express.raw({ type: 'application/json' }),
async (req, res) => {
// Verify signature
const signature = req.headers['x-webhook-signature'];
if (!verifySignature(req.body, signature, process.env.WEBHOOK_SECRET)) {
return res.status(401).json({ error: 'Invalid signature' });
}
const event = JSON.parse(req.body);
// Write to durable queue
await webhookQueue.add(event.type, {
id: event.id,
type: event.type,
data: event.data,
headers: {
signature: req.headers['x-webhook-signature'],
timestamp: req.headers['x-webhook-timestamp']
},
receivedAt: new Date().toISOString()
}, {
// BullMQ job options
attempts: 5,
backoff: { type: 'exponential', delay: 5000 },
removeOnComplete: { age: 86400 }, // Keep completed jobs for 24 hours
removeOnFail: false // Keep failed jobs for inspection
});
res.status(200).json({ received: true });
}
);
Step 2: Process in Background Worker
const worker = new Worker('webhooks', async (job) => {
const { id, type, data } = job.data;
console.log(`Processing webhook ${id} (${type}), attempt ${job.attemptsMade + 1}`);
switch (type) {
case 'payment_intent.succeeded':
await handlePaymentSuccess(data);
break;
case 'customer.subscription.deleted':
await handleSubscriptionCancellation(data);
break;
case 'invoice.payment_failed':
await handlePaymentFailure(data);
break;
default:
console.log(`Unhandled event type: ${type}`);
}
}, {
connection: { host: 'localhost', port: 6379 },
concurrency: 10 // Process up to 10 events simultaneously
});
// Handle worker-level events
worker.on('completed', (job) => {
console.log(`Webhook ${job.data.id} processed successfully`);
});
worker.on('failed', (job, err) => {
console.error(`Webhook ${job.data.id} failed:`, err.message);
if (job.attemptsMade >= job.opts.attempts) {
// All retries exhausted — send to dead letter queue
moveToDeadLetterQueue(job.data, err.message);
}
});
Step 3: Dead Letter Queue for Final Failures
async function moveToDeadLetterQueue(eventData, errorMessage) {
await db.query(
`INSERT INTO webhook_dead_letter_queue
(event_id, event_type, payload, error_message, failed_at, attempts)
VALUES ($1, $2, $3, $4, NOW(), $5)`,
[
eventData.id,
eventData.type,
JSON.stringify(eventData),
errorMessage,
eventData.attempts || 5
]
);
// Alert the team
await sendAlert({
channel: 'webhook-failures',
message: `Webhook event ${eventData.id} (${eventData.type}) moved to dead letter queue after ${eventData.attempts} failed attempts. Error: ${errorMessage}`
});
}
The queue-then-ack pattern handles the tension between two competing requirements: the provider wants a fast response (within seconds), and your business logic might need to do slow operations (database queries, API calls, emails). By separating receipt from processing, you satisfy both requirements without compromise.
Returning the Right HTTP Status Codes
Your HTTP response directly controls the provider's behavior. Return the wrong code, and you either lose events or create unnecessary retries.
Status Codes and Their Effects
app.post('/webhook', async (req, res) => {
// 200: Success — event received and accepted
// Provider will NOT retry
res.status(200).json({ received: true });
// 202: Accepted — event received, processing deferred
// Provider will NOT retry (2xx = success)
res.status(202).json({ accepted: true, processing: 'queued' });
// 400: Bad Request — invalid payload (your fault or theirs)
// Most providers will NOT retry (client error)
res.status(400).json({ error: 'Invalid payload format' });
// 401: Unauthorized — signature verification failed
// Behavior varies: some retry, some disable the webhook
res.status(401).json({ error: 'Invalid signature' });
// 410: Gone — endpoint permanently removed
// Provider will DISABLE the webhook subscription
res.status(410).json({ error: 'Endpoint no longer exists' });
// 429: Too Many Requests — rate limited
// Provider will retry with backoff
res.status(429).json({ error: 'Rate limit exceeded' });
// 500: Internal Server Error — temporary failure
// Provider WILL retry
res.status(500).json({ error: 'Internal error, please retry' });
// 503: Service Unavailable — temporarily down
// Provider WILL retry
res.status(503).json({ error: 'Service temporarily unavailable' });
});
Strategic Status Code Usage
Use status codes strategically to control retry behavior:
app.post('/webhook', async (req, res) => {
try {
// Signature verification — return 401 if invalid
if (!verifySignature(req)) {
return res.status(401).json({ error: 'Invalid signature' });
}
// Payload validation — return 400 for permanently invalid payloads
if (!isValidPayload(req.body)) {
return res.status(400).json({ error: 'Invalid payload' });
}
// Queue the event
await queue.add(req.body);
// Return 200 — event is safely queued
return res.status(200).json({ received: true });
} catch (error) {
if (error.code === 'QUEUE_UNAVAILABLE') {
// Queue is down — return 503 to trigger retry
return res.status(503).json({ error: 'Queue unavailable' });
}
// Unknown error — return 500 to trigger retry
return res.status(500).json({ error: 'Internal error' });
}
});
Be very careful with 410 (Gone). Returning this status code tells the provider to permanently disable your webhook subscription. Only use it when you intentionally want to stop receiving webhooks. An accidental 410 response during a deployment can require manual re-registration of the webhook.
Circuit Breaker Pattern
When your webhook handler depends on external services (databases, APIs, email providers), a failure in one service can cause all webhook processing to fail. The circuit breaker pattern prevents cascading failures.
class CircuitBreaker {
constructor(options = {}) {
this.failureThreshold = options.failureThreshold || 5;
this.resetTimeout = options.resetTimeout || 60000; // 1 minute
this.state = 'CLOSED'; // CLOSED = normal, OPEN = failing, HALF_OPEN = testing
this.failureCount = 0;
this.lastFailureTime = null;
}
async execute(fn) {
if (this.state === 'OPEN') {
// Check if enough time has passed to try again
if (Date.now() - this.lastFailureTime >= this.resetTimeout) {
this.state = 'HALF_OPEN';
} else {
throw new Error('Circuit breaker is OPEN — service unavailable');
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onSuccess() {
this.failureCount = 0;
this.state = 'CLOSED';
}
onFailure() {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.failureThreshold) {
this.state = 'OPEN';
console.error(`Circuit breaker OPENED after ${this.failureCount} failures`);
}
}
}
// Usage in webhook processing
const dbCircuitBreaker = new CircuitBreaker({ failureThreshold: 3, resetTimeout: 30000 });
const emailCircuitBreaker = new CircuitBreaker({ failureThreshold: 5, resetTimeout: 60000 });
async function processPaymentWebhook(event) {
// Database update — critical, fail the job if circuit is open
await dbCircuitBreaker.execute(async () => {
await db.query('UPDATE orders SET status = $1 WHERE id = $2',
['paid', event.data.order_id]);
});
// Email notification — non-critical, skip gracefully if circuit is open
try {
await emailCircuitBreaker.execute(async () => {
await sendReceiptEmail(event.data.customer_email, event.data);
});
} catch (error) {
console.warn('Email circuit breaker open, skipping email notification');
await queueEmailForLater(event.data);
}
}
Graceful Degradation
Not all processing steps are equally important. Design your webhook handler to degrade gracefully when non-critical operations fail.
async function handleOrderWebhook(event) {
const results = {
critical: [],
nonCritical: []
};
// CRITICAL: These MUST succeed or the job should retry
try {
await updateOrderDatabase(event.data);
results.critical.push({ step: 'database', status: 'success' });
} catch (error) {
results.critical.push({ step: 'database', status: 'failed', error: error.message });
throw error; // Re-throw to trigger job retry
}
// NON-CRITICAL: These SHOULD succeed but failure is acceptable
const nonCriticalTasks = [
{ name: 'email', fn: () => sendConfirmationEmail(event.data) },
{ name: 'analytics', fn: () => trackAnalyticsEvent(event.data) },
{ name: 'slack', fn: () => notifySlackChannel(event.data) },
{ name: 'crm', fn: () => updateCRMRecord(event.data) }
];
for (const task of nonCriticalTasks) {
try {
await task.fn();
results.nonCritical.push({ step: task.name, status: 'success' });
} catch (error) {
console.warn(`Non-critical task '${task.name}' failed:`, error.message);
results.nonCritical.push({ step: task.name, status: 'failed', error: error.message });
// Queue failed non-critical tasks for later retry
await retryQueue.add('retry-task', {
task: task.name,
eventId: event.id,
data: event.data,
failedAt: new Date().toISOString()
});
}
}
return results;
}
This approach ensures that a failure in your Slack notification does not prevent a payment from being recorded. Critical operations cause retries; non-critical operations fail gracefully and are queued for later processing.
Alerting on Webhook Failures
You cannot fix what you do not know about. Set up comprehensive alerting for webhook failures:
Alert Levels
const alertConfig = {
// Level 1: Single failure — log it
singleFailure: (event, error) => {
console.error(`Webhook processing failed: ${event.id}`, error.message);
},
// Level 2: Repeated failures — alert the team
repeatedFailures: async (event, error, attemptCount) => {
if (attemptCount >= 3) {
await sendSlackAlert({
channel: '#webhook-alerts',
text: `Webhook ${event.id} (${event.type}) has failed ${attemptCount} times. Latest error: ${error.message}`
});
}
},
// Level 3: Dead letter queue — page someone
deadLetterQueue: async (event, error) => {
await sendPagerDutyAlert({
severity: 'high',
summary: `Webhook ${event.id} moved to dead letter queue after all retries exhausted`,
details: {
eventType: event.type,
eventId: event.id,
error: error.message
}
});
},
// Level 4: High failure rate — incident
highFailureRate: async (failureRate, window) => {
if (failureRate > 0.1) { // More than 10% failure rate
await sendPagerDutyAlert({
severity: 'critical',
summary: `Webhook failure rate is ${(failureRate * 100).toFixed(1)}% over the last ${window} minutes`
});
}
}
};
Monitoring with Webhookify
Rather than building custom alerting infrastructure, Webhookify provides real-time alerts for webhook delivery and processing issues. When webhooks to your Webhookify endpoints fail or show unusual patterns, you receive immediate notifications via Telegram, Discord, Slack, email, or push notifications. The mobile app plays a distinctive cash register sound for payment events, giving you audible confirmation that revenue is flowing.
Set up a two-tier alerting system: use Webhookify for real-time delivery monitoring (catching issues at the network level), and use your own application-level alerts for processing failures (catching issues in your business logic). This gives you complete visibility across both layers of the webhook pipeline.
Handling Specific Error Scenarios
Database Unavailable
async function handleWithDatabaseFallback(event) {
try {
await db.query('INSERT INTO events (id, data) VALUES ($1, $2)',
[event.id, JSON.stringify(event.data)]);
} catch (error) {
if (error.code === 'ECONNREFUSED' || error.code === 'ETIMEDOUT') {
// Database is down — write to local fallback
await writeToLocalFallback(event);
console.error('Database unavailable, event written to local fallback');
return; // Do not throw — event is safely persisted
}
throw error; // Re-throw other database errors
}
}
async function writeToLocalFallback(event) {
const fallbackPath = `/tmp/webhook-fallback/${event.id}.json`;
await fs.writeFile(fallbackPath, JSON.stringify(event));
// A recovery job periodically reads from this directory and writes to the database
}
External API Timeout
async function callExternalAPIWithTimeout(url, data, timeoutMs = 5000) {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), timeoutMs);
try {
const response = await fetch(url, {
method: 'POST',
body: JSON.stringify(data),
headers: { 'Content-Type': 'application/json' },
signal: controller.signal
});
if (!response.ok) {
throw new Error(`API returned ${response.status}`);
}
return await response.json();
} catch (error) {
if (error.name === 'AbortError') {
throw new Error(`External API timed out after ${timeoutMs}ms`);
}
throw error;
} finally {
clearTimeout(timeout);
}
}
Payload Schema Changes
async function handleWithSchemaValidation(event) {
// Validate against expected schema
const validation = validateSchema(event);
if (!validation.valid) {
// Log the schema violation for investigation
console.warn('Webhook payload schema violation:', {
eventId: event.id,
eventType: event.type,
errors: validation.errors,
payload: JSON.stringify(event.data).slice(0, 500)
});
// Try to process with available fields (forward compatibility)
try {
await processWithFlexibleSchema(event);
} catch (processingError) {
// Cannot process — alert team about potential breaking change
await sendAlert(`Webhook schema change detected for ${event.type}. Review required.`);
throw processingError;
}
} else {
await processEvent(event);
}
}
Error Handling Architecture Summary
Here is the complete error handling architecture in one view:
Webhook Received
│
├── Signature Invalid? ──> Return 401
│
├── Payload Invalid? ──> Return 400
│
├── Queue Write Fails? ──> Return 503 (trigger provider retry)
│
└── Queue Write Succeeds ──> Return 200
│
▼
Background Worker
│
├── Processing Succeeds ──> Mark Complete
│
├── Critical Step Fails ──> Retry with Backoff
│ │
│ ├── Retry Succeeds ──> Mark Complete
│ │
│ └── All Retries Fail ──> Dead Letter Queue + Alert
│
└── Non-Critical Step Fails ──> Log + Queue for Later
Real-Time Webhook Failure Alerts
Webhookify monitors every webhook delivery and alerts you instantly via Telegram, Discord, Slack, email, or push notifications when failures occur. Catch issues before they impact your users.
Set Up Alerts FreeFurther Reading
- Webhook Retry Logic and Idempotency — handle retries and prevent duplicates
- Webhook Debugging Guide — troubleshoot specific failures
- Webhook Security Best Practices — prevent authentication errors
- How Webhooks Work — understand delivery guarantees
- The Complete Guide to Webhooks — webhook fundamentals
Related Articles
- Webhook Retry Logic and Idempotency
- The Ultimate Webhook Debugging Guide
- How to Set Up Stripe Webhook Notifications
- How to Set Up Shopify Webhook Notifications
- Real-Time Payment Failure Alerts with Webhooks