Issue #25

The Decision Engine: How to Codify Your Judgment Into Rules That Let Your AI System Operate Autonomously

The AI Playbook 16 min read 3 prompts
← Issue #24: The Bootstrapper

Building on Issues #1–24

You have built the agent. You have the quality gate, the feedback loop, the evolution engine, the memory layer, the scaling governance, and the bootstrap framework. Your system works. It catches errors, proposes its own improvements, coordinates across multiple agents, and runs on a schedule.

But every morning, you open a digest. You read through 15 recommendations. You approve 12, reject 2, and defer 1. You check a dashboard, notice a metric drifting, and decide whether to intervene or wait another day. You glance at a queue of proposed rule changes, evaluate each one, and click “promote” or “discard.”

You are the bottleneck.

Not because you are slow. Because you are the only entity in the system with judgment. Your agents can detect patterns, propose changes, test hypotheses, and report results. But they cannot decide. Every decision waits for you. And the system’s throughput is capped at the number of decisions you can make per day.

This issue fixes that. Not by removing you from the system — but by teaching the system what you would decide, so it only escalates the decisions you actually need to see.


The Three Types of Decisions

Before you can codify judgment, you need to classify the decisions your system asks you to make. Every decision falls into one of three categories:

Type Description Example Can Automate?
Mechanical Same inputs always produce the same output. No judgment required. “If quality score > 0.95, ship automatically.” Yes
Pattern Requires judgment, but your judgment follows a consistent pattern you could articulate. “I always approve rule changes that improve PF by >5% with n > 50.” After extraction
Novel Requires context, intuition, or information the system does not have. “Should we pivot our pricing model?” No — escalate

Most people assume their decisions are mostly Novel. They are not. When you audit your actual decisions over a 30-day period, you will find that 60–80% are Mechanical or Pattern decisions — things you decide the same way every time, based on data the system already has.

The Decision Engine automates the Mechanical and Pattern decisions. It escalates the Novel ones. And it continuously learns from your Novel decisions to reclassify them as Pattern decisions over time.

The compounding effect: Every time you make a Novel decision, the system records your reasoning. After you make the same type of decision 5+ times with consistent logic, the Decision Engine proposes a new rule. If you approve, that entire class of decisions never reaches you again. Your inbox of decisions shrinks every week.


The Decision Rule Format

A decision rule has five components. All five are required. If any one is missing, the rule is incomplete and will not be promoted to autonomous operation.

Component What It Is Example
Condition The specific, measurable trigger shadow_pf_improvement > 0.05 AND sample_size > 50
Action What the system does when the condition is true promote_rule_to_production
Confidence How certain the system should be before acting 0.92 (based on 12/13 historical approvals matching this pattern)
Fallback What happens if confidence is below threshold escalate_to_human with summary + recommendation
Audit Trail What the system records for every autonomous decision rule_id, timestamp, inputs, decision, confidence, reasoning

The Audit Trail is the component most people skip. It is the most important one. Without it, you have an autonomous system you cannot inspect. When something goes wrong — and something will go wrong — the audit trail is how you diagnose, learn, and tighten the rule. Every autonomous decision must be more traceable than every human decision, not less.

Pro Tip

Start every rule with a confidence threshold of 0.95. This means the system will only act autonomously when it is extremely certain. As the rule proves itself over 30+ correct autonomous decisions, you can lower the threshold to 0.85 or 0.80. Never start low and raise later. Starting high is safe. Starting low creates bad habits the system internalizes.


Step 1: The Decision Audit

Before you write a single rule, you need data. Specifically, you need a record of every decision you have made in the last 30 days and the reasoning behind each one.

If your system already logs your approvals and rejections (it should — Issue #16 covered this), pull those logs. If not, start logging now and come back in 30 days. You cannot build a Decision Engine without decision data.

Once you have the data, run Prompt 1.

Prompt 1 — The Decision Auditor

This agent reads your decision history and classifies each decision into the three types. It also identifies patterns — groups of decisions where you applied the same logic repeatedly.

Prompt 1 — The Decision Auditor
You are a decision pattern analyst. Your job is to read a
log of human decisions and classify each one, then identify
repeating patterns that could be automated.

Input: A JSON array of decision records. Each record has:
- decision_id: unique identifier
- timestamp: when the decision was made
- context: what the system presented to the human
- decision: what the human chose (approve/reject/defer/modify)
- reasoning: why (if recorded), or "not recorded"

Produce a report with these exact sections:

## DECISION CLASSIFICATION
For each decision, classify as:
- MECHANICAL: Same inputs always produce same output
- PATTERN: Judgment required, but follows a consistent rule
- NOVEL: Requires context the system does not have

Include: decision_id, classification, confidence (0-1),
and a 1-line justification for the classification.

## PATTERN EXTRACTION
For each group of PATTERN decisions that share logic:
- pattern_id: a short descriptive name
- decisions: list of decision_ids that follow this pattern
- rule_draft: the decision rule in plain English
  ("When X is true and Y > threshold, the human always Z")
- consistency: what percentage of decisions in this group
  follow the rule exactly (must be > 80% to qualify)
- exceptions: any decisions that ALMOST fit but diverged,
  with notes on why

## AUTOMATION CANDIDATES
Rank all patterns by:
1. Frequency (how often this decision type occurs)
2. Consistency (how reliably the human follows the pattern)
3. Impact (what happens if the rule is wrong once)

Top candidates = high frequency + high consistency + low
impact if wrong.

## NOVEL DECISIONS
List all NOVEL decisions. For each, explain:
- Why this cannot currently be automated
- What additional data or context would be needed to
  eventually automate it
- Whether it could become a PATTERN decision if the system
  tracked specific additional information

Be conservative. If you are unsure whether a decision is
PATTERN or NOVEL, classify it as NOVEL. False negatives
(missing an automatable pattern) are safe. False positives
(automating a decision that requires human judgment) are
dangerous.

What you get: A ranked list of your most automatable decisions, with draft rules and consistency scores. This is the roadmap for your Decision Engine. Start with the top 3 candidates — the decisions you make most often, most consistently, with the lowest downside if wrong.


Step 2: Write the Rules

Take the top 3 automation candidates from your audit. For each one, you need to convert the plain-English pattern into an executable decision rule with all five components.

This is where most people make the critical mistake: they write the rule too broadly. A pattern that says “I usually approve performance improvements” becomes a rule that says if improvement > 0, approve. That rule will approve a 0.1% improvement with a sample size of 3. You would never approve that.

The rule must be tighter than your judgment, not looser. If you are unsure whether you would approve something, the rule should escalate. The system earns autonomy by being more conservative than you, not less.

Prompt 2 — The Rule Writer

Prompt 2 — The Rule Writer
You are a decision rule engineer. Your job is to convert a
plain-English decision pattern into a precise, executable
decision rule with safety guarantees.

Input:
- pattern_description: the plain-English pattern from the
  Decision Auditor
- historical_decisions: the specific decisions that formed
  this pattern (with context and reasoning)
- consistency_score: how often the human followed this
  pattern exactly

For each pattern, produce a decision rule with ALL FIVE
components:

## CONDITION
- Express as a boolean formula using only measurable fields
- Every threshold must come from the historical data
  (not invented)
- Use the TIGHTEST threshold that captures 90%+ of
  historical approvals
- Include a minimum sample size requirement (never less
  than n=30)
- Example: improvement_pf > 0.05 AND sample_size >= 50
  AND days_in_shadow >= 10

## ACTION
- One of: approve, reject, defer, escalate, modify
- If modify: specify exactly what changes
- If defer: specify the re-evaluation trigger

## CONFIDENCE CALCULATION
- How to compute confidence for this specific rule
- Must use historical consistency as the baseline
- Formula: (matching_historical_decisions /
  total_historical_decisions) * recency_weight
- Recency weight: decisions from last 7 days count 2x,
  last 30 days count 1x, older counts 0.5x

## FALLBACK
- What happens when confidence < threshold (default 0.95)
- Must include: escalation path, summary format, and
  recommended action with reasoning
- The human must see: the data, what the rule WOULD have
  decided, and why the confidence was below threshold

## AUDIT RECORD
- JSON schema for the audit log entry
- Must include: rule_id, timestamp, all input values,
  computed confidence, decision made, and whether it was
  autonomous or escalated

SAFETY CONSTRAINTS:
- No rule may have a confidence threshold below 0.80
- No rule may act on fewer than 30 historical examples
- Every rule must have a kill switch: if the rule makes
  3 consecutive decisions that a human later overrides,
  it automatically reverts to full escalation mode
- Every rule must expire after 90 days and require
  re-validation against fresh data
Pro Tip

The kill switch and expiration are non-negotiable. A rule that was 95% accurate six months ago may be 60% accurate today because the underlying data distribution shifted. The 90-day expiration forces you to re-validate — and more importantly, it forces the system to re-validate, because the Decision Engine should handle re-validation automatically. If a rule expires and its historical accuracy is still above threshold, it re-promotes itself. If not, it escalates for human review.


Step 3: The Graduation Protocol

You do not deploy a rule straight to autonomous operation. You graduate it through tiers, exactly like Issue #17’s earned autonomy model — but applied to decisions instead of outputs.

Tier 0 — Shadow Mode (Days 1–14)

The rule evaluates every decision but takes no action. It logs what it would have decided alongside what you actually decided. At the end of 14 days, you compare. If the rule matches your decisions 95%+ of the time, it graduates to Tier 1.

Key metric: Shadow accuracy. The percentage of decisions where the rule’s output matches yours.

Tier 1 — Suggest Mode (Days 15–30)

The rule makes recommendations that appear in your daily digest, clearly labeled as [AUTO-SUGGEST]. You still make every decision, but you can see what the rule would have done. This catches cases where shadow accuracy was high but the rule’s reasoning was wrong — right answer, wrong logic.

Key metric: Override rate. How often you choose differently from the suggestion. If override rate is below 5%, graduate to Tier 2.

Tier 2 — Act-and-Report (Days 31–60)

The rule acts autonomously, but every decision appears in your digest for review. You do not need to approve each one — you just scan for errors. If you see one, you override it and the system logs the override as training data for the next rule revision.

Key metric: Post-action override rate. If you override fewer than 2% of autonomous decisions over 30 days, graduate to Tier 3.

Tier 3 — Full Autonomy (Day 61+)

The rule acts autonomously. Decisions appear in the weekly summary, not the daily digest. You review them once per week in aggregate. The audit trail records everything. The kill switch remains active — 3 consecutive overrides in a single review session automatically demotes the rule back to Tier 1.

Key metric: Weekly override rate. Should remain below 1%. If it rises above 3% in any week, automatic demotion.

61 days
From first shadow to full autonomy
Each decision type you graduate removes 5–15 minutes from your daily review. By the time you have graduated 5 rules, you have reclaimed over an hour every day — permanently.

Step 4: The Learning Loop

The Decision Engine is not a one-time build. It is a living system that gets smarter as you use it.

Every time you make a Novel decision, the system records it. After 5 Novel decisions of the same type with consistent logic, the Decision Auditor proposes a new pattern. After 10, the Rule Writer drafts a rule. After 30, the rule enters shadow mode automatically.

This is the flywheel:

  1. You make decisions. The system watches and records.
  2. Patterns emerge. The auditor identifies them.
  3. Rules are drafted. The rule writer formalizes them.
  4. Rules graduate. Shadow → Suggest → Act-and-Report → Full Autonomy.
  5. Your decision load shrinks. You make fewer decisions, each one more important.
  6. The remaining decisions are harder. Which means they are more valuable for the system to learn from.
  7. Repeat.

After 6 months, the decisions that reach you are genuinely novel — the ones that require your unique context, intuition, or values. Everything else runs on rules you validated and the system maintains.

Prompt 3 — The Graduation Monitor

This agent runs weekly. It checks every active rule’s performance, proposes graduations and demotions, and identifies new pattern candidates from your Novel decisions.

Prompt 3 — The Graduation Monitor
You are a decision rule governance agent. You run weekly to
maintain the health of all active decision rules.

Input:
- active_rules: JSON array of all rules with their current
  tier, accuracy metrics, and audit logs
- human_decisions: all human decisions from the past 7 days
- overrides: any cases where a human overrode an autonomous
  decision

Produce a weekly governance report with these sections:

## RULE HEALTH
For each active rule:
- rule_id, current_tier, days_at_tier
- accuracy_this_week, accuracy_all_time
- decisions_made_this_week (autonomous vs escalated)
- override_count_this_week
- status: HEALTHY / WARNING / DEMOTE / EXPIRE

## GRADUATION CANDIDATES
Rules ready to move up a tier:
- rule_id, current_tier, proposed_tier
- evidence: accuracy %, override rate, sample size
- recommendation: GRADUATE or HOLD (with reasoning)

## DEMOTION TRIGGERS
Rules that should move down a tier:
- rule_id, current_tier, proposed_tier
- trigger: what went wrong (override spike, accuracy drop,
  distribution shift)
- recommendation: DEMOTE or INVESTIGATE

## EXPIRING RULES
Rules within 14 days of their 90-day expiration:
- rule_id, expiration_date
- current_accuracy vs original_accuracy
- recommendation: RENEW (accuracy held) or RETIRE (decayed)
- If RENEW: updated confidence thresholds based on recent
  data

## NEW PATTERN CANDIDATES
Novel decisions from the past 7 days that match existing
Novel decisions:
- proposed_pattern_name
- matching_decision_ids (must be >= 5)
- draft_rule in plain English
- consistency_score
- recommendation: DRAFT RULE or NEEDS MORE DATA

Format the entire report as structured JSON so it can be
consumed programmatically by the Decision Engine.

The Safety Architecture

Autonomous decisions require stronger safety guarantees than human decisions. When a human makes a bad decision, they notice immediately and correct it. When a rule makes a bad decision, it may not be caught until the weekly review — and by then, the damage may have compounded.

Your Decision Engine needs four safety layers:

Layer What It Does When It Fires
Kill Switch Demotes rule to Tier 1 after 3 consecutive overrides Real-time, on every override
Drift Detector Monitors input distribution. If inputs look different from training data, escalates. On every decision
Impact Cap Limits the blast radius of any single autonomous decision Before action execution
Expiration Forces re-validation every 90 days On schedule

The Drift Detector deserves special attention. A rule trained on bull-market decisions will make bad decisions in a bear market. Not because the rule is wrong — because the world changed. The drift detector compares every new decision’s input features against the historical distribution. If any feature is more than 2 standard deviations from the training mean, the decision escalates regardless of confidence score.

Pro Tip

The Impact Cap is context-dependent. A rule that auto-approves a content edit has a small blast radius — the worst case is a bad paragraph that gets fixed next day. A rule that auto-approves a deployment has a large blast radius. Set impact caps proportional to the cost of being wrong, not the frequency of the decision.


What This Looks Like in Practice

After 90 days with the Decision Engine, a typical system looks like this:

83%
Decisions handled autonomously
Of all decisions the system needs to make, 83% are handled by graduated rules. The human reviews 17% — the genuinely novel ones that require judgment the system has not yet learned.

Here is what your daily workflow looks like before and after:

Activity Before After
Morning digest review 45 min (read everything, decide everything) 8 min (scan autonomous decisions, decide on 3–5 escalations)
Rule change approvals 20 min (evaluate each proposal) 0 min (graduated rules handle standard promotions)
Quality gate overrides 15 min (review flagged outputs) 5 min (only novel edge cases escalate)
System monitoring 10 min (check dashboards manually) 0 min (monitoring rules act autonomously, alert on anomaly)
Total daily time 90 min 13 min

That is 77 minutes per day. 9 hours per week. 38 hours per month. Almost a full work week, every month, permanently freed.


Common Mistakes

1. Automating Novel decisions

If you cannot articulate why you made a decision — if it was intuition, gut feeling, or “I just knew” — it is a Novel decision. Do not write a rule for it. Let the system collect more examples. Intuition is often a pattern you have not consciously identified yet. After 20+ examples, the pattern may become clear. Or it may not, and it stays human-only. Both are fine.

2. Starting at Tier 2

The graduation protocol exists for a reason. Shadow mode catches logic errors. Suggest mode catches reasoning errors. Skipping tiers is how you end up with a rule that makes 50 bad decisions before you notice. Two weeks of shadow is cheap insurance.

3. Ignoring the drift detector

Your rules were trained on historical conditions. Markets shift. Customer behavior changes. Team priorities evolve. A rule that was 98% accurate last quarter may be 70% accurate this quarter because the world it was trained on no longer exists. The drift detector and the 90-day expiration are not bureaucracy — they are the immune system.

4. Too many rules at once

Start with 3. Graduate them fully. Learn from the process. Then add 3 more. If you write 20 rules on day one, you will spend all your time managing rules instead of making decisions. The goal is less human effort, not more.

5. No audit trail

If you cannot explain why the system made a decision, you do not have an autonomous system. You have a black box. When something goes wrong — and it will — the audit trail is the difference between a 5-minute fix and a week of debugging.


The Decision Engine Checklist

Print this. Check each box as you complete it.

Phase 1: Audit (Week 1)

Phase 2: Rule Writing (Week 2)

Phase 3: Graduation (Weeks 3–10)

Phase 4: Governance (Ongoing)


Try It This Week

Pull your decision log from the last 30 days. If you do not have one, start today — every time you approve, reject, or modify something your AI system suggests, write one line: what you decided and why. In 30 days, you will have enough data to run the Decision Auditor.

If you already have the log, run Prompt 1 now. Identify your top 3 automation candidates. Write the rules. Put them in shadow mode. In 61 days, those decisions will never reach you again.

The goal is not to remove yourself from the system. The goal is to ensure that every minute you spend on the system is spent on decisions that actually require you. Not the decisions you make on autopilot — the decisions that make a difference.

Next Issue

Issue #26: The Observatory

Your system makes decisions autonomously. Your rules graduate themselves. Your agents coordinate across systems. But how do you know the whole thing is actually working? Issue #26 covers the observatory — a meta-monitoring layer that watches the watchers, catches systemic drift before it reaches any single rule, and gives you a single number that answers: “Is my AI system healthy today?”

Get the next issue

One tested AI workflow, delivered every week. No fluff.

Free forever. One email per week. Unsubscribe anytime.