Bitnob Payout Incident Postmortem Template

1. Incident Overview

field
details
Incident Title
Short descriptive title (e.g., "NIP Corridor Payout Delays - 24 April 2025")
Date and Time of Incident
When the incident started and ended (UTC).
Reported By
Who detected or reported the issue.
Severity Level
Critical / High / Medium / Low.
Affected Systems
APIs, Treasury, Webhooks, Internal Dashboards, Specific Corridors.
Affected Customers
Number of customers impacted, specific partners if known.
Initial User Impact
Delayed payouts, failed payouts, incorrect statuses, user escalations.

2. Incident Timeline

time (utc)
event
13:00
First alert triggered by liquidity buffer monitor for NGN payouts.
13:15
Webhook delivery failures reported from multiple partners.
13:30
On-call engineering team initiated investigation.
14:00
Root cause identified: upstream banking API outage.
14:15
Communication sent to impacted users.
15:30
Payout retries initiated manually.
16:00
Full service restoration confirmed.

3. Root Cause Analysis

field
details
Primary Root Cause
NIP upstream banking rail outage with no available redundancy.
Contributing Factors
Treasury buffer was sufficient, but no alternate payout route activated automatically.
Detection Weaknesses
Alerting on webhook failure was delayed by 15 minutes.
Process Weaknesses
Manual escalation to Treasury delayed liquidity redeployment.

4. Impact Assessment

field
details
Number of Failed or Delayed Payouts
Total and breakdown per corridor.
User Impact
Number of user support tickets, social escalations, refunds required.
Financial Impact
Any losses, refunds, fees paid to users.
Regulatory Impact
Any reportable incidents to regulators or partners.

5. Immediate Actions Taken

action
timestamp
owner
Liquidity buffer topped up manually for NGN corridor.
14:00 UTC
Treasury Ops
Partner bank outage escalated to account manager.
14:05 UTC
BD Team
User communication emails sent explaining delay.
14:15 UTC
Customer Support
Webhook replay queued for delayed payouts.
15:00 UTC
Engineering

6. Lessons Learned

Detection must happen within 5 minutes, not 15 minutes.

Redundant liquidity routes for high-risk corridors (e.g., backup NIP provider) must be operational.

User communication templates for payout delays must be pre-approved and ready for immediate use.


7. Preventative and Remediation Actions

action item
owner
deadline
Integrate second NIP banking partner for automatic failover.
Treasury and Engineering
May 15, 2025
Implement faster webhook failure detection and alerting.
Engineering Ops
May 1, 2025
Review SLA thresholds and escalation policies for payouts.
PM + Ops
April 30, 2025
Create live liquidity monitoring dashboard visible to PMs and Operations.
Product Analytics
May 5, 2025

8. Communication Summary

audience
channel
message sent
Affected Users
Email / In-App Notifications
Service interruption notice and next steps.
Internal Teams
Slack / Incident Response Channel
Real-time updates and postmortem sharing.
Strategic Partners (if needed)
Direct Email
Professional incident report if SLAs breached.

Incident Severity: (Confirm based on real financial and user impact.)

Full Postmortem Distribution: (Confirm who will receive final write-up — e.g., leadership, compliance, strategic partners.)

A strong postmortem culture is not about avoiding blame. It is about building payout products that:

Detect failures earlier,

Recover faster,

Communicate better,

Protect user trust under stress.

Serious payout platforms are not judged by whether incidents happen. They are judged by how systematically and transparently they respond and improve.


Sample Filled Payout Incident Postmortem Report

1. Incident Overview

field
details
Incident Title
NIP Corridor Payout Delays - 24 April 2025
Date and Time of Incident
24 April 2025, 12:30 UTC – 15:45 UTC
Reported By
Treasury Ops Monitor
Severity Level
High
Affected Systems
NIP (Nigeria Instant Payment) payouts, Webhook delivery delays
Affected Customers
143 users, 2 enterprise payout partners
Initial User Impact
Payouts delayed beyond 30-minute SLA; increased support ticket volume

2. Incident Timeline

time (utc)
event
12:30
Liquidity buffer monitor flagged NGN payout delays.
12:45
Webhook delivery failures started appearing for NIP payouts.
13:00
On-call engineering initiated investigation.
13:20
Root cause identified: upstream bank API partial outage (Bank Partner A).
13:30
Treasury switched to backup bank partner manually.
13:45
User support escalation triggered communications.
14:30
Manual payout retries began.
15:45
All delayed payouts completed successfully.

3. Root Cause Analysis

field
details
Primary Root Cause
Upstream partner bank's NIP API degraded without automated failover.
Contributing Factors
Liquidity buffer was sufficient, but platform did not auto-switch payout route.
Detection Weaknesses
Webhook failure threshold was too high, delaying internal alert.
Process Weaknesses
Manual Treasury intervention required; failover was not automated.

4. Impact Assessment

field
details
Number of Failed or Delayed Payouts
157 payouts delayed; 0 permanently failed.
User Impact
27 user support tickets; 3 escalations to account managers.
Financial Impact
No direct financial loss; goodwill refunds ($50 total) to key accounts.
Regulatory Impact
None; internal thresholds for SLA breaches not exceeded materially.

5. Immediate Actions Taken

action
timestamp
owner
Manual liquidity reallocation to backup bank.
13:30
Treasury Ops
User notification emails sent.
13:45
Customer Support
Triggered manual webhook replays for delayed payouts.
14:30
Engineering Ops

6. Lessons Learned

Relying on a single payout rail per corridor is operationally fragile.

Webhook delivery failures should trigger alerts faster (current threshold too lenient).

User communication templates should be ready to deploy immediately, not drafted during incident.


7. Preventative and Remediation Actions

action item
owner
deadline
Integrate auto-failover to multiple NIP bank partners.
Engineering + Treasury
15 May 2025
Lower webhook failure alert threshold from 5% to 2%.
Engineering Ops
2 May 2025
Pre-approve standard payout delay user notification templates.
Customer Support
30 April 2025

8. Communication Summary

audience
channel
message sent
Affected Users
Email + In-App Notifications
14:00 UTC
Internal Teams
Slack Incident Channel
Real-time updates throughout
Enterprise Partners
Direct Email Reports
17:00 UTC after incident closure

Final Notes

Note

Incident Severity: High (user SLA breach but no financial or regulatory damage)

Postmortem distributed to Product, Treasury, Engineering, and Leadership.

Notes on Formatting

Note

Both the template and the sample are clean Markdown style.

Very easy to convert to a .md file or even export to PDF cleanly.


Use this template when investigating payout incidents. Copy the structure below and fill in the details for each section.

Bitnob Payout Incident Postmortem Template

1. Incident Overview
field
details
Incident Title
Short descriptive title (e.g., GHS Mobile Money Payout Delays)
Date and Time of Incident
Start and end time in UTC
Reported By
Person or system that first detected the issue
Severity Level
Critical / High / Medium / Low
Affected Systems
e.g., Payouts API, webhook delivery, settlement engine
Affected Customers
Number of users and/or partners impacted
Initial User Impact
Describe what users experienced (e.g., delayed payouts, failed webhooks)
2. Incident Timeline
time (utc)
event
HH:MM
Issue first detected by [monitoring/user report]
HH:MM
Investigation started
HH:MM
Root cause identified
HH:MM
Mitigation applied
HH:MM
Full resolution confirmed
3. Root Cause Analysis
field
details
Primary Root Cause
What directly caused the incident
Contributing Factors
What made the impact worse or delayed detection
4. Impact Assessment
field
details
Failed or Delayed Payouts
Total count and status breakdown
User Impact
Support tickets, complaints, escalations
Financial Impact
Refunds, credits, or lost revenue
Regulatory Impact
Any compliance implications
5. Immediate Actions Taken
action
timestamp
owner
Describe the action taken
HH:MM UTC
Team or person responsible
6. Lessons Learned

What went well during the incident response?

What could have been detected earlier?

What processes or tooling gaps were exposed?

7. Preventative and Remediation Actions
action item
owner
deadline
Describe the follow-up action
Team or person
Target date
8. Communication Summary
audience
channel
message sent
Affected Users
Email / In-App / SMS
When and what was communicated
Internal Teams
Slack / PagerDuty
When and what was communicated
Partners
Direct Email / API status page
When and what was communicated

Sample Filled Payout Postmortem

1. Incident Overview
field
details
Incident Title
NIP Corridor Payout Delays - 24 April 2025
Date and Time of Incident
24 April 2025, 12:30 UTC – 15:45 UTC
Reported By
Treasury Ops Monitor
Severity Level
High
Affected Systems
NIP payouts, Webhook delivery delays
Affected Customers
143 users, 2 enterprise partners
Initial User Impact
Payouts delayed beyond SLA, increased support tickets
2. Incident Timeline
time (utc)
event
12:30
Liquidity monitor flagged NGN payout delays.
12:45
Webhook delivery failures appeared.
13:00
Investigation initiated.
13:20
Root cause found: upstream bank API degradation.
13:30
Manual switch to backup bank.
13:45
User communication triggered.
14:30
Manual webhook replays started.
15:45
Full payout restoration confirmed.
3. Root Cause Analysis
field
details
Primary Root Cause
Upstream banking partner outage.
Contributing Factors
No automatic failover, webhook alert delay.
4. Impact Assessment
field
details
Failed or Delayed Payouts
157 delayed
User Impact
27 support tickets
Financial Impact
$50 goodwill refunds
Regulatory Impact
None
5. Immediate Actions Taken
action
timestamp
owner
Liquidity switch
13:30
Treasury Ops
User notifications
13:45
Support
Webhook replays
14:30
Engineering Ops
6. Lessons Learned

Auto-failover is critical for major corridors.

Faster webhook failure alerts needed.

7. Preventative and Remediation Actions
action item
owner
deadline
Add second bank partner
Engineering + Treasury
15 May 2025
Tighten webhook alert thresholds
Engineering Ops
2 May 2025
Approve user delay communication templates
Support
30 April 2025
8. Communication Summary
audience
channel
message sent
Affected Users
Email, In-App
During incident
Internal
Slack
Continuous
Partners
Direct Email
Post-resolution

Share on
Share on FacebookShare on XShare on LinkedIn
Did you find this page useful?