Review Pipeline
Home GitHub

Review Pipeline

Part III: Track Lifecycle · Chapter 10

6 min read

The code is written. Tests pass. The plan shows every task marked [x]. You could merge now and move on. But "tests pass" is a low bar — it tells you the code does what the tests check, not that it does what the specification requires, not that it's secure, not that it follows your architecture, and not that it handles the edge cases nobody thought to test. Draft's review pipeline exists to close these gaps systematically.

Stage 1 Automated Validation Security, structure, perf GATE Stage 2 Spec Compliance Requirements, acceptance GATE Stage 3 Code Quality Patterns, errors, testing
The three-stage review pipeline: each stage acts as a gate. Failures stop the pipeline early — no point checking code quality on structurally broken code.

The Three Stages

/draft:review runs a sequential, three-stage review process. Each stage acts as a gate — if it fails, the review stops and sends you back to fix the structural problem before wasting effort on higher-order concerns. You don't check code quality on structurally broken code, and you don't check structure on code that doesn't meet the spec.

$ /draft:review track add-oauth2

Auto-detected track: add-oauth2 — Add OAuth2 Support [~]
Commit range: a1b2c3d^..f4e5d6c (11 commits, 15 files)
Running three-stage review...

Stage 1: Automated Validation

The first stage is fast, objective, and mechanical. It answers one question: is the code structurally sound and secure?

The reviewer scans the diff for five categories of issues:

  1. Architecture Conformance — Checks for pattern violations documented in your .ai-context.md. A database import in a React component, a direct service-to-service call bypassing the message bus, a controller containing business logic — these are structural violations caught before anyone reads the code.
  2. Dead Code Detection — Searches for newly exported functions or classes in the diff that have zero references anywhere in the codebase. Code that nothing calls is either incomplete work or a maintenance burden.
  3. Dependency Cycle Detection — Traces import chains for new imports to ensure no circular dependencies (A imports B, B imports C, C imports A) are introduced. Circular dependencies indicate poor module boundaries.
  4. Security Scanning (OWASP) — Scans the diff for hardcoded secrets and API keys, SQL injection risks (string concatenation in queries), and XSS vulnerabilities (innerHTML or raw DOM insertion).
  5. Performance Anti-Patterns — Detects N+1 database queries (loops containing queries), blocking synchronous I/O within async functions, and unbounded queries lacking pagination.

Context-Specific Checks

Stage 1 also identifies the primary domain of changed files and applies targeted checks. If the diff touches authentication files, the reviewer checks for timing-safe comparisons, constant-time operations, and secure random generation. Database migrations get checked for backward compatibility, index coverage, and zero-downtime safety. API endpoints get checked for input validation, rate limiting, and authentication guards. Configuration changes are scanned for exposed secrets and missing startup validation.

Breaking Change Detection

Stage 1 checks for changes to public API surfaces: modified function signatures, removed or renamed exports, changed error types, and altered serialization formats. A breaking change with no deprecation period or version bump is classified as Critical — it blocks the review immediately.

STRIDE Threat Modeling

For new endpoints or data mutations, Stage 1 applies the STRIDE framework — six threat categories evaluated systematically:

  • Spoofing — Can the caller's identity be faked? (authentication check)
  • Tampering — Can request data be modified in transit? (integrity check)
  • Repudiation — Are actions logged for audit? (logging check)
  • Information Disclosure — Does the response leak internal details? (error message check)
  • Denial of Service — Can the endpoint be abused? (rate limiting, resource limits)
  • Elevation of Privilege — Are authorization checks in place? (RBAC/ABAC check)

If Stage 1 fails — any critical issue found — the review stops. The report lists the structural failures, and you fix them before proceeding. There is no point checking spec compliance on code with a SQL injection vulnerability.

SAST Tool Recommendations

After completing Stage 1, the reviewer recommends appropriate static analysis tools based on your tech-stack.md: ESLint with eslint-plugin-security for JavaScript, Bandit for Python, gosec for Go, cargo clippy and cargo audit for Rust, and Semgrep or CodeQL for multi-language projects. If these tools are not already in your CI pipeline, the recommendation appears in the report.

Stage 2: Spec Compliance

Stage 2 answers the most important review question: did they build what was specified? This stage only runs for track-level reviews where a spec.md exists.

The reviewer loads the specification's acceptance criteria and systematically verifies each one against the diff:

  1. Requirements Coverage — For every functional requirement in spec.md, the reviewer finds evidence in the diff that it was implemented. Each requirement maps to specific files and line numbers.
  2. Acceptance Criteria Verification — Each criterion is checked against the diff. If TDD is enabled, the reviewer verifies that test coverage exists for each criterion.
  3. Scope Adherence — The reviewer checks for missing features (spec items not implemented) and for scope creep (extra work not in the spec). Both directions are flagged.
## Stage 2: Spec Compliance

**Status:** PASS

### Requirements Coverage
- [x] Google OAuth2 login flow — Implemented in src/auth/google.ts:15-89
- [x] GitHub OAuth2 login flow — Implemented in src/auth/github.ts:12-76
- [x] Linked accounts in profile — Implemented in src/components/LinkedAccounts.tsx

### Acceptance Criteria
- [x] Google OAuth2 end-to-end — Verified by tests/auth/google.test.ts
- [x] GitHub OAuth2 end-to-end — Verified by tests/auth/github.test.ts
- [x] Existing login unaffected — Verified by tests/auth/password.test.ts
- [x] Token revocation — Verified by tests/auth/revocation.test.ts
- [x] Rate limiting on callback — Verified by tests/auth/ratelimit.test.ts

If Stage 2 fails — any requirement missing or acceptance criterion not met — the review stops. Stage 3 does not run. You go back to implementation and close the gaps before the review continues. Checking code quality on code that doesn't meet the spec is a waste of time.

Stage 3: Code Quality

Stage 3 is the semantic review — the part that requires judgment, not just pattern matching. It evaluates four dimensions:

  1. Architecture — Does the code follow project patterns from tech-stack.md? Appropriate separation of concerns? Critical invariants from .ai-context.md honored?
  2. Error Handling — Errors handled at the appropriate level? User-facing errors helpful? No silent failures?
  3. Testing — Tests verify real logic, not implementation details? Edge cases have coverage?
  4. Maintainability — Code readable without excessive comments? Consistent naming and style? No functions exceeding cognitive complexity thresholds? No deeply nested control flow beyond three levels?

For each flagged function, the report includes file path, function name, estimated complexity, and a recommended action: split, extract, or simplify.

The Adversarial Pass

When Stage 3 produces zero findings across all four dimensions, the reviewer does not accept "clean" at face value. Instead, it runs an adversarial pass — explicitly asking probing questions designed to find what a lazy review would miss:

  1. Error paths — Is every error or exception handled? Are any failure modes silently swallowed?
  2. Edge cases — Are there boundary conditions (empty input, max values, concurrent access) not covered by tests?
  3. Implicit assumptions — Does the code assume inputs are always valid, services always up, or state always consistent?
  4. Future brittleness — Is anything hardcoded that will break on scale or configuration change?
  5. Missing coverage — Is there behavior that should be tested but isn't?

If the adversarial pass still finds nothing, the report documents it explicitly with a one-sentence explanation per question. This prevents lazy LGTM verdicts — when the reviewer claims "nothing to find," it has to prove the claim.

Zero-Finding Verification Error paths handled? Boundary conditions tested? Inputs assumed valid? 🔧 Hardcoded values? future brittleness 🔍 Missing coverage? untested behavior Anti-patterns? known violations Invariant violations? broken contracts Every "LGTM" must prove all 7 probes were examined
The adversarial pass: seven probe questions arranged around a zero-finding verification core. When Stage 3 finds nothing, the reviewer must explicitly address each probe before declaring the review clean.
Why Sequential Stages Matter

The three-stage sequence is not arbitrary. Running all checks simultaneously produces a noisy report where security vulnerabilities sit next to naming suggestions. The sequential design ensures you fix structural problems first, then verify spec compliance, then polish quality. Each stage builds on the confidence that the previous stage provides.

Issue Classification

Every finding is classified by severity, and severity determines the required action:

  • Critical — Blocks release. Breaks functionality or creates a security issue. Must fix before proceeding.
  • Important — Degrades quality or introduces technical debt. Should fix before the phase is complete.
  • Minor — Style, optimization, nice-to-have. Noted for later. Does not block.

Each issue includes the file and line number, a description of the impact, and a suggested fix. The reviewer never auto-fixes — it reports, and the developer decides.

### Critical Issues
- [ ] [src/auth/oauth.ts:67] SQL injection risk in provider lookup
  - **Impact:** Attacker can inject SQL via provider parameter
  - **Suggested fix:** Use parameterized query instead of string interpolation

### Important Issues
- [ ] [src/auth/session.ts:34] No circuit breaker on token refresh call
  - **Impact:** Provider outage cascades to all authentication attempts
  - **Suggested fix:** Add circuit breaker with 5-failure threshold

Review Scopes

While track-level review is the primary workflow, /draft:review supports several scopes:

  • /draft:review — Auto-detects the active track and reviews it (all three stages)
  • /draft:review track add-oauth2 — Reviews a specific track by ID
  • /draft:review project — Reviews uncommitted changes (Stage 1 and 3 only, no spec)
  • /draft:review files "src/**/*.ts" — Reviews specific file patterns
  • /draft:review commits main...HEAD — Reviews a commit range

Project-level reviews skip Stage 2 (spec compliance) since there is no specification to check against. They run automated validation and code quality only.

The with-bughunt Modifier

For comprehensive reviews, adding with-bughunt or full combines the three-stage review with Draft's deep bug hunting capability. The bug hunter looks for logic errors, race conditions, and edge cases that static analysis misses, producing regression tests for each finding. Results are aggregated into the review report with deduplication — if both the reviewer and bug hunter flag the same file and line, the finding with the highest severity is kept and attributed to both tools.

$ /draft:review track add-oauth2 with-bughunt

Running three-stage review...
Stage 1 (Automated Validation): PASS
Stage 2 (Spec Compliance): PASS
Stage 3 (Code Quality): PASS WITH NOTES
Running bug hunt...

Total issues: 7 (0 Critical, 2 Important, 5 Minor)
Report: draft/tracks/add-oauth2/review-report-2026-03-30T1445.md
Review Reports Are Versioned

Each review creates a timestamped report file. A review-report-latest.md symlink always points to the most recent one. Previous reports are preserved, so you can compare findings across review iterations and track how issues were resolved.

The review report updates metadata.json with the review verdict, timestamp, and a running review count. This metadata feeds into /draft:status, giving you a project-wide view of which tracks have been reviewed and which still need attention.

With code reviewed and issues resolved, the track nears completion. The next chapter covers the operational commands that manage tracks throughout their lifecycle — monitoring progress, handling mid-stream requirement changes, and safely rolling back work when needed.