Deep Review

Part IV: Quality · Chapter 12

5 min read

Your team is two weeks from launch. The feature works in staging, passes all tests, and the product manager is satisfied. Then someone asks: "What happens if the database connection drops mid-transaction?" Silence. Nobody knows. A standard code review checked that the code is clean, follows conventions, and handles the expected cases. But nobody audited whether the system survives the unexpected ones. That is the gap /draft:deep-review fills — a production-readiness audit that goes beyond correctness into resilience, durability, and operational maturity.

Deep review evaluates modules across multiple production-readiness dimensions: ACID compliance, security, resilience, SLOs, database patterns, observability, idempotency, and error handling.

Beyond Code Review

Standard code review asks: "Is this code correct?" Deep review asks: "Will this code survive production?" These are fundamentally different questions. Correct code can still lose data during crashes, deadlock under concurrent load, or silently corrupt state when a downstream service goes down.

/draft:deep-review operates at the module level, not the pull-request level. It audits an entire service, component, or module against production-grade criteria. The command auto-selects the next unreviewed module from your project, or you can target one explicitly:

$ /draft:deep-review
  Module selected: src/auth/ (first unreviewed module)
  Reason: Not found in draft/deep-review-history.json

$ /draft:deep-review src/payments/
  Module selected: src/payments/ (explicitly requested)
  Re-reviewing: last reviewed 2026-02-14

Each completed review is logged to draft/deep-review-history.json with the module name, timestamp, issue count, and summary. Subsequent runs automatically pick the next unreviewed module, or the one with the oldest review date if all have been covered.

The ACID Compliance Audit

Deep review borrows from database theory. ACID properties — Atomicity, Consistency, Isolation, Durability — are not just database concerns. They apply to any system that manages state. Every service that writes data, updates records, or coordinates multi-step operations must answer these questions.

ACID Beyond Databases

ACID compliance is evaluated for every state-changing operation in the module, not just database transactions. An HTTP handler that updates a cache, sends a notification, and writes to a database is a multi-step operation that can fail partway through. Deep review audits each such operation for atomicity, consistency, isolation, and durability guarantees.

Property	What Deep Review Checks
Atomicity	Are multi-step operations wrapped in transactions? If step 3 of 5 fails, do steps 1-2 roll back? Are there missing rollback paths that leave corrupt state?
Consistency	Are invariants enforced before and after every state transition? Are schema constraints, data type validations, and boundary conditions checked at every boundary?
Isolation	Can concurrent operations interfere? Is there shared mutable state without locking? Are transaction isolation levels appropriate for the use case?
Durability	Does committed data survive crashes? Are there fire-and-forget patterns, missing flush/sync calls, or inadequate error handling around persistence operations?

The audit extends beyond classical ACID into distributed system patterns. If the module uses event sourcing, deep review checks whether events are immutable and replay is idempotent. If it uses CQRS, it evaluates whether the consistency lag between read and write models is acceptable. If sagas coordinate multi-service operations, it verifies that compensating transactions are defined for every step.

Production Robustness Patterns

Passing tests does not mean surviving production. Deep review evaluates a set of production robustness patterns that separate "works on my machine" from "works at 3 AM when the cache cluster is down."

Resilience

Does the module degrade gracefully when dependencies fail? Deep review looks for circuit breakers on external calls, timeout handling on every outbound request, and backpressure mechanisms that prevent cascade failures. It checks what happens at 10x and 100x current traffic, identifies bottlenecks in connection pools and thread pools, and evaluates whether the module can shed excess load without crashing.

Idempotency and Retry Safety

Can operations be safely retried without side effects? If a payment charge is retried after a network timeout, does the user get double-charged? Deep review traces retry logic to ensure exponential backoff with jitter is used and that retry budgets exist to prevent retry storms.

Fail-Closed Behavior

When something unexpected happens, does the system fail into a safe state or an unsafe one? An authorization check that defaults to "allow" on error is fail-open — a security vulnerability. Deep review identifies every error path and evaluates whether the default behavior is safe.

Observability Assessment

Production systems that cannot be observed cannot be debugged. Deep review evaluates observability across multiple dimensions:

Structured logging — Are logs structured (JSON/key-value) or free-form strings? Are log levels used correctly (ERROR for actual errors, not expected conditions)?
Correlation IDs — Can a request be traced across service boundaries? Are tracing spans created at service boundaries with relevant attributes?
Metric cardinality — Are metric labels bounded? Unbounded labels like user_id cause metric explosion that crashes monitoring infrastructure.
PII leakage — Do logs or error messages expose personally identifiable information, tokens, or credentials?
Alerting coverage — Are critical failure modes covered by alerts? Are there runbooks linked to those alerts?

Metric Cardinality Explosion

A common production failure: a developer adds user_id as a metric label. With 100,000 users, every metric now has 100,000 time series. Prometheus runs out of memory. Grafana dashboards timeout. The monitoring system itself becomes the outage. Deep review catches unbounded label patterns before they reach production.

SLO Evaluation

Deep review checks whether the module has defined service-level objectives and whether the architecture can actually deliver them. A module claiming 99.99% availability (52 minutes of downtime per year) while depending on a single database instance without failover is making a promise its architecture cannot keep.

The evaluation covers latency profiles (p50, p95, p99 targets), error budgets, and the gap between stated availability targets and actual architectural capabilities. If no SLOs are defined, the review recommends establishing them — because a system without SLOs has no definition of "good enough."

Document Your SLOs Before You Need Them

Deep review audits all dimensions for every module — ACID compliance, observability, resilience, SLOs — regardless of service type. The review is most valuable when it has concrete targets to check against. SRE and platform engineers should document actual SLO targets (availability, latency percentiles, error budgets), observability requirements (what must be logged, required metrics, alerting thresholds), and security posture expectations in workflow.md or your team's runbook. Without defined targets, the review can only flag the absence of SLOs — it cannot tell you whether your architecture delivers on the promises your team has made.

Database-Specific Analysis

For modules that interact with databases, deep review performs targeted analysis that generic code review misses:

Check	What It Catches
Missing indexes	Queries filtering or joining on unindexed columns, causing full table scans
N+1 queries	ORM relationships that trigger one query per row instead of a single join
Wide table scans	`SELECT *` or queries without `WHERE` clauses on large tables
Schema constraints	Missing `NOT NULL`, `UNIQUE`, or `FOREIGN KEY` constraints that allow invalid data
Migration safety	Migrations that require downtime or are not backward-compatible with the running application

The Report

Deep review produces a structured report saved to draft/deep-review-reports/<module-name>.md. Every finding is classified by severity and formatted as an actionable specification — not a vague suggestion, but a concrete description of what to fix and how:

### [Critical] Missing Rollback on Payment Failure
**File:** src/payments/processor.ts:142
**Description:** Transaction lacks rollback when gateway returns
  a timeout error. Steps 1-3 (debit, ledger entry, notification)
  complete, but if step 4 (gateway confirmation) times out, the
  debit and ledger entry persist without a corresponding charge.
**Proposed Fix Specification:**
- Wrap steps 1-4 in a database transaction
- Add explicit rollback on gateway timeout
- Emit structured log with correlation ID for reconciliation
- Add a compensating transaction for the notification

The report ends with a verdict: PASS (only minor issues), CONDITIONAL PASS (no critical issues, but important ones exist), or FAIL (at least one critical issue found). A FAIL verdict means the module is not production-ready.

The verdict scale: FAIL indicates critical issues that block production deployment, CONDITIONAL PASS means no critical issues but important ones require attention before launch, and PASS confirms the module is production-ready with only minor findings.

API Contract Drift Detection

A subtle class of production bugs comes from drift between documented API contracts and actual implementations. Deep review compares the module's code interfaces against OpenAPI/Swagger specs, Protobuf definitions, GraphQL schemas, or TypeScript type exports. It flags endpoints that exist in code but not in the spec, types that differ between spec and implementation, and undocumented endpoints that external consumers may be relying on.

When to Use Deep Review

Deep review is expensive in terms of time and token usage. It is not meant for every commit or every pull request. Use it for:

Pre-launch audits — Before shipping a new service or major feature to production
Critical path changes — When modifying payment processing, authentication, or data integrity code
Compliance requirements — When security or regulatory review requires documented evidence of production-readiness assessment
Post-incident review — After a production incident, to audit the affected module for related issues

Deep Review vs. Regular Review

Regular /draft:review checks whether code matches the spec, follows conventions, and passes quality gates. Deep review assumes the code is correct and asks whether it will survive the conditions that production imposes: failures, load, concurrency, crashes, and the slow degradation that happens over months of operation. They complement each other — review for development quality, deep review for operational readiness.

After completing the audit, deep review runs the pattern learning phase, updating draft/guardrails.md with architecture and concurrency conventions discovered during the module analysis. These patterns feed back into future reviews and implementations, making each subsequent deep review more precise.