Skip to content

Chapter 8 – B1 Keeps You Alive

8.1 Recognising B1 in Practice

B1 work represents the lifeline of an organisation — the work that ensures its very ability to operate safely and reliably. These are the actions that preserve revenue, compliance, customer trust, and operational stability. Recognising a B1 task quickly and objectively prevents chaos from spreading. In moments of uncertainty, clarity must come before speed.

B1 recognition is not about panic or who shouts loudest. It’s about using facts to decide whether something truly demands immediate attention. The question to ask is simple, but powerful:

“If we do nothing for X hours/days, will this cause financial loss, legal breach, or customer harm?”

If the answer is yes, it’s B1. If the answer is uncertain, label it as Rising Urgency and verify. If the answer is no, it probably belongs in B2 or B3. This single check helps teams separate true emergencies from emotional urgency.

To keep decisions consistent, ODUI teams rely on a short B1 checklist displayed on their intake board:

  • Does it threaten customer trust or safety?

  • Does it breach any law, regulation, or contract?

  • Does it stop a critical business process or revenue stream?

  • Does it cause reputational risk or operational paralysis?

If one or more of these statements are true, it’s B1. Everything else can wait.

The key is to make this checklist part of daily rhythm — visible, habitual, and accepted by all. Over time, this shared reference builds collective intuition, allowing teams to classify faster without conflict or debate.

When B1 recognition becomes instinctive, the organisation moves with calm precision even in crisis. Instead of panic, people act with purpose. The clarity of what belongs in B1 creates confidence everywhere else.

Precision beats panic. Recognising B1 is not about emotion — it’s about protecting what keeps the organisation alive.

8.2 The B1 Response in Practice: Tools, Timing, and Roles

Chapter 3 introduced the logic of the B1 response — Detect, Assess, Act, Communicate, Learn. Here, we move from principle to practice. This section explains how those five steps happen in real time, who is involved, and what tools make the rhythm consistent.

1. Detect — Tools and Triggers

Detection starts with automation. Systems should surface incidents before customers do. Effective teams rely on:

  • Monitoring dashboards for uptime, latency, and error rates.

  • Automated alerts with severity labels (Critical / Major / Minor).

  • Designated intake channels where employees can flag risks quickly.

  • Escalation matrix showing who to contact for each type of alert.

Service-level guidance: confirm detection promptly according to your severity matrix and regulatory/SLA commitments.

2. Assess — Verify and Prioritise

Assessment is a short triage conversation between the intake lead, delivery lead, and technical lead. They confirm:

  • What’s actually broken?

  • Who is affected, and how soon?

  • Does it meet B1 criteria?
    If yes, the DM declares a formal B1 event and opens a record in the incident log.

Tools: shared impact checklist, quick scoring template (importance × urgency).
Service-level guidance: complete initial assessment promptly in line with your incident policy and severity matrix.

3. Act — Contain and Restore

Execution follows a simple rule: contain first, perfect later. The Incident Owner assigns one Fix Lead, one Recorder, and any needed support. Document key actions and timestamps in a live channel or issue board.

Tools:

  • Standardised “Incident Card” template with fields: Impact, Mitigation, ETA, Owner.

  • Predefined playbooks for common failures (e.g., service outage, API delay, data corruption).

Service-level guidance: initiate first mitigation as quickly as feasible; track restoration continuously and visibly.

4. Communicate — Clarity Over Noise

Communication uses one official channel, never multiple threads. All updates follow a set pattern:

Impact → Action → ETA → Next Update.

Executives and stakeholders can subscribe to updates but should not interfere with the fix team. Use dashboards to display live status for transparency.

Service-level guidance: provide a first update promptly, then continue at a cadence appropriate to severity (as defined in your incident communications policy).

5. Learn — From Fix to Prevention

Within an agreed timeframe (often 24–48 hours), the PM leads a Root Cause Analysis (RCA) session. The team captures what happened, why, and what prevention is needed. Each RCA should create at least one B2 prevention task with clear ownership and due date.

Tools: RCA form, linked B2 backlog item, learning tracker.

Example: Payment Failure Scenario

  • 00:00: Alert — 20% transaction errors detected.

  • 00:03: Intake lead confirms critical impact → declares B1.

  • 00:07: Fix team assembled, rollback initiated.

  • 00:15: Update: “Rollback in progress, ETA 10 mins.”

  • 00:25: Service restored. Communication closes.

  • Next day: RCA reveals outdated patch → B2 automation added.

Important: In ODUI, the case is not fully closed until the linked B2 prevention action — in this case, the automation — is implemented. Closure requires both resolution and prevention. This ensures every incident strengthens the system instead of recurring.

Operational excellence is repetition, not reaction. The tools and timing make calm performance possible under pressure.

Note on timing: ODUI is cross-industry. Replace any illustrative timings with your own SLA/severity matrix so expectations match regulatory and operational realities.

This example is intentionally simple to illustrate rhythm and structure. In reality, some incidents may take hours or even days to resolve, depending on their complexity, dependencies, and regulatory implications. What matters is not speed alone but maintaining calm discipline, transparent communication, and continuous coordination throughout the entire duration.

8.3 Roles and Coordination During B1 Events

During a B1 event, clarity of roles and coordination determines the difference between rapid recovery and prolonged chaos. Every participant must know their responsibility, authority, and communication boundary. ODUI insists on explicit ownership, not shared assumption.

Role Core Responsibilities During B1
Incident Owner (usually the Delivery Lead) Declares the B1 status, activates the response team, maintains the update rhythm, and ensures the incident channel stays factual. Owns the overall timeline and closure.
Outcome Lead Confirms customer and business impact, tracks decisions, and guarantees that post-incident learning translates into B2 prevention work. The outcome lead acts as the bridge between the fixers and stakeholders.
Technical Team (Ops / Dev) Executes mitigation and recovery actions, confirms when systems are stable, and documents key steps for RCA. Works under the direction of the Incident Owner to avoid duplication or conflicting changes.
Executives / Leaders Stay informed but hands-off. Their role is to remove blockers, approve emergency resource use, and model calm behaviour. They protect the team from noise and keep external stakeholders reassured.

Coordination Rules

  1. Single Source of Truth: One live incident board or chat thread tracks all actions and updates. Nobody works from private messages or side discussions.

  2. Clear Command Chain: The Incident Owner is the single voice of authority. Others contribute expertise, not extra direction.

  3. Visual Ownership: Names and roles should be visible on the incident board — who leads, who supports, who communicates.

  4. Recovery First, Reporting Later: All energy goes to containment until the system stabilises. Reporting and RCA come after the immediate threat is neutralised.

  5. Tone and Tempo: Executives and senior leaders set the tone. Calm, factual communication cascades downward. Anxiety from the top multiplies across teams.

In B1 situations, leadership is not about hierarchy — it’s about precision. The team succeeds when every person knows their lane, respects handoffs, and works through structured coordination.

During B1, clarity is leadership. Every minute of confusion costs progress.

8.4 B1 Communication Protocols

In a B1 situation, communication is not decoration — it is control. The right information, delivered clearly and on time, can calm an organisation faster than any technical fix. Poor communication, by contrast, multiplies anxiety and confusion. ODUI standardises communication to protect focus and trust.

Core Communication Principles

  1. One Channel per Incident
    All updates and questions occur in a single thread — typically a dedicated Slack or Teams bridge. This avoids duplication, misinformation, and side conversations that steal focus.

  2. Structured Messaging
    Every update follows the same format to keep stakeholders oriented:

    Impact → Action → ETA → Next Update

    This ensures that every message answers the questions people care about: what’s broken, what’s being done, and when to expect progress.

  3. Status Tiers for Visibility
    Clear states keep everyone aligned on progress:

    • Investigating – issue confirmed, cause unknown.

    • Fixing – root cause identified, work underway.

    • Resolved – services restored, stability confirmed.

    • Reviewed – RCA completed and B2 prevention logged.

  4. Cadence of Updates
    Update frequency depends on severity and stakeholder expectations. Critical issues may need updates every 30–60 minutes, while major but contained incidents can follow a slower rhythm. The key is predictability — people should never wonder when they’ll hear next.

  5. Template for Updates
    Example:

    Impact: Customer checkout unavailable for 20% of users.
    Action: Rollback in progress; ETA 15 minutes.
    Next update: 14:45.

  6. After-Action Communication
    Once resolved, post an official closure message:

    Status: ✅ Resolved. Root cause identified as expired SSL certificate. RCA scheduled tomorrow 10:00. Prevention task logged as B2 item #204.

    Within an agreed timeframe (often 24–48 hours), the Outcome Lead or Incident Owner publishes a concise post-mortem summary. This transparency turns a stressful event into a learning moment.

Why It Matters

Effective communication keeps teams synchronised and stakeholders informed without micromanagement. It demonstrates control and professionalism even under stress. A predictable communication rhythm builds trust far faster than reassurance ever could.

Facts calm fear. Clear, consistent updates transform crisis into confidence.

8.5 Preventing B1 Overload

When B1 incidents pile up, the problem is not luck — it’s design. Constant emergencies are a symptom of weak systems, unclear ownership, or missing preventive action. B1 overload exhausts teams, lowers morale, and steals capacity from strategic work. The goal of ODUI is not just to handle B1s well, but to make them increasingly rare.

How to Keep the System Healthy

  1. Track Repeat Incidents
    Repetition is the clearest signal of a systemic gap. If the same cause reappears, the root issue was never solved. Maintain a log of repeat B1s and flag any pattern that repeats more than twice in a quarter.

  2. Monitor Mean Time Between Incidents (MTBI)
    MTBI measures the average time between major B1 events. A rising MTBI trend means your stability is improving. A falling one means prevention has stalled. Discuss MTBI openly in Outcome Reviews to keep reliability visible at the leadership level.

  3. Maintain a B1 Buffer
    Reserve 10–15% of team capacity for unexpected incidents. This buffer absorbs shocks without derailing ongoing B2 and B3 work. Without a buffer, every B1 cascades into panic and overtime.

  4. Run Prevention Sprints
    Every quarter, dedicate a short sprint or focused block of time to eliminate the top recurring B1 causes. These targeted improvements — automation, monitoring, process tightening — convert chaos into calm. Prevention sprints pay back in resilience and morale.

  5. Make Prevention Visible
    Track preventive B2 work derived from B1 RCAs on the ODUI board. When teams see progress in prevention, motivation rises and fear decreases. Celebrate every avoided incident as a quiet victory.

  6. Watch for Cultural Fatigue
    If every week feels like survival mode, the issue is cultural as much as technical. Leadership must step in to question priorities, capacity, and systemic design. Chronic urgency is a leadership failure, not team weakness.

A healthy ODUI system treats B1s as exceptions, not a way of life. When calm becomes normal, productivity and trust naturally rise.

A calm system is a productive system. Prevention is not optional — it’s operational strength.

8.6 ODUI Language Glossary

Here are the new ODUI terms introduced or used heavily in this chapter.

New ODUI terms (Chapter 8)

Term Meaning
B1 checklist A short, visible list of tests that confirm whether something truly belongs in B1.
B1 event A formally declared B1 situation (so everyone knows the rules, roles, and communication rhythm).
Incident Owner The single owner who leads the B1 response (often Flow Lead / on-call incident commander).
Fix Lead The person driving the technical mitigation and recovery work during the incident.
Recorder The person capturing timestamps, decisions, and updates in real time (prevents memory gaps).
Incident log The official record of a B1 event (what happened, when, and what was decided).
Incident Card A simple template for the live incident: Impact, Mitigation, ETA, Owner (plus links).
Escalation matrix A simple guide showing who to contact based on incident type and severity.
Severity matrix Your organisation’s rulebook for classifying incident severity and required response times (often tied to SLAs).
Contain first, perfect later The B1 action rule: stabilise and reduce harm first, then improve and harden afterwards.
Status tiers A small set of shared status labels (Investigating → Fixing → Resolved → Reviewed).
Cadence of updates The agreed rhythm of communication during a B1 event (predictable beats, not random messages).
After‑action communication The closure and summary message after recovery (what happened, what changed, what’s next).
Post‑mortem A short written summary of the incident and the lessons learned (linked to prevention work).
Prevention sprint A focused time block dedicated to removing recurring B1 causes.
Cultural fatigue The human cost of living in permanent urgency: low morale, burnout, and fragile systems.

Core ODUI questions (Chapter 8)

  • B1 test: If we do nothing until tomorrow, will this cause real harm?
  • Bucket purity: Is this truly B1, or is it B3 pressure in disguise?
  • Ownership: Who is the Incident Owner?
  • Communication: Where is the single source of truth, and when is the next update?
  • Learning: What will we change so this does not repeat?